10,000 Matching Annotations
  1. Oct 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The authors sought to examine the associations between child age, reports of parent-child relationship quality, and neural activity patterns while children (and also their parents) watched a movie clip. Major methodological strengths include the sample of 3-8 year-old children in China (rare in fMRI research for both age range and non-Western samples), use of a movie clip previously demonstrated to capture theory of mind constructs at the neural level, measurement of caregiver-child neural synchrony, and assessment of neural maturity. Results provide important new information about parent-child neural synchronization during this movie and associations with reports of parent-child relationship quality. The work is a notable advance in understanding the link between the caregiving context and the neural construction of theory of mind networks in the developing brain.

      We are grateful for the reviewer’s generous and thoughtful summary of our work. We particularly appreciate the recognition of the methodological strengths—including the rare developmental sample, culturally diverse context, and use of naturalistic, theory of mind-relevant stimuli—as well as the importance of integrating neural synchrony and relational variables. The reviewer’s comments affirm the core motivation behind this study: to advance our understanding of how the caregiving environment shapes the neurodevelopment of social cognition in early childhood. We have taken all specific suggestions seriously and hope the revised manuscript more clearly communicates these contributions.

      We appreciate that the authors wanted to show support for a mediational mechanism. However, we suggest that the authors drop the structural equation modeling because the data are cross-sectional so mediation is not appropriate. Other issues include the weak justification of including the parent-child neural synchronization as part of parenting.... it could just as easily be a mechanism of change or driven by the child rather than a component of parenting behavior. The paper would be strengthened by looking at associations between selected variables of interest that are MOST relevant to the imaging task in a regression type of model. Furthermore, the authors need to be more explicit about corrections for multiple comparisons throughout the manuscript; some of the associations are fairly weak so claims may need to be tempered if they don't survive correction.

      Thanks for feedback on the use of SEM in our study. We recognize the limitations of using SEM to infer mediation with cross-sectional data and acknowledge that longitudinal designs are better suited for such analyses. However, our goal was not to establish causality but to explore potential pathways linking parenting, personal traits, and Theory of Mind (ToM) behavior to social cognition outcomes. SEM allowed us to simultaneously examine the relationships among these latent constructs, providing a cohesive framework for understanding the interplay of these factors. That said, we understand your concern and are willing to revise the manuscript to de-emphasize causal interpretations of the SEM findings.

      We thank the reviewer for raising the corrections for multiple comparisons. We confirm that all correlation analyses reported in the manuscript have been corrected for multiple comparisons using the False Discovery Rate (FDR) procedure. In the revised manuscript, we now explicitly indicate FDR correction for all relevant p-values to ensure clarity and transparency. Where this information was previously missing, we have corrected the oversight and clearly labeled the results as FDR-corrected or uncorrected where appropriate. Additionally, we have carefully reviewed our interpretation of all reported associations. For any results that were close to the significance threshold, we have tempered our claims and now describe them as a marginally significant association to avoid overstating our findings.

      The corresponding changes have been made on Discussion section of the revised manuscript.

      Reverse correlation analysis is sensible given what prior developmental fMRI studies have done. But reverse correlation analysis may be more prone to overfitting and noise, and lacks sensitivity to multivariate patterns. Might inter-subject correlation be useful for *within* the child group? This would minimize noise and allow for non-linear patterns to emerge.

      We appreciate the reviewer’s thoughtful suggestion regarding potential limitations of reverse correlation analysis. While we agree that inter-subject correlation (ISC) within the child group may be useful in other contexts, our primary goal in using reverse correlation was not to identify temporally distributed or multivariate response patterns, but rather to isolate specific events within the naturalistic stimulus that reliably evoke Theory of Mind (ToM) and Social Pain-related responses in adults—who possess more stable and mature neural signatures. These adult-derived events serve as anchors for subsequent developmental comparisons and provide a principled way to define timepoints of interest that are behaviorally and theoretically meaningful.

      Using reverse correlation in adults allows us to identify canonical ToM and Social Pain events in a data-driven yet hypothesis-informed manner. We then examine how children’s neural responses to these same events vary with age, neural maturity, and dyadic synchrony. This approach is consistent with prior work in developmental social neuroscience (e.g., Richardson et al., 2018) and offers a valid framework for identifying interpretable social-cognitive events in naturalistic stimuli.

      We have now clarified the rationale for using adult-based reverse correlation in the revised manuscript and explicitly stated its advantages for identifying targeted ToM and Social Pain content in the stimulus.

      The corresponding changes have been made on pages 17 of the revised manuscript.

      “We employed reverse correlation analysis in adults to identify discrete events within the movie that elicited reliable neural responses across participants in ToM and SPM networks.

      The events of adults were chosen for this analysis due to the relative stability and maturity of their social brain responses, allowing for robust detection of canonical ToM and social pain-related moments. These events, once identified, served as stimulus-locked timepoints for subsequent analyses in the child cohort. This approach enables us to examine how children's responses to well-characterized, socially meaningful events vary with age and parent-child dyadic dynamics.”

      No learning effects or temporal lagged effects are tested in the current study, so the results do not support the authors' conclusions that the data speak to Bandura's social learning theory. The authors do mention theories of biobehavioral synchrony in the introduction but do not discuss this framework in the discussion (which is most directly relevant to the data). The data can also speak to other neurodevelopmental theories of development (e.g.,neuroconstructivist approaches), but the authors do not discuss them. The manuscript would benefit from significantly revising the framework to focus more on biobehavioral synchrony data and other neurodevelopmental approaches given the prior work done in this area rather than a social psychology framework that is not directly evaluated.

      We appreciate the reviewer’s thoughtful and constructive feedback. We agree that the current study does not directly test mechanisms central to Bandura’s social learning theory, such as observational learning over time or behavioral modeling. In light of this, we have significantly revised the theoretical framing of the manuscript to focus more directly on the biobehavioral synchrony framework, which more accurately reflects the dyadic neural measures employed in this study and is better supported by our findings.

      Specifically, we have expanded the Discussion to contextualize our findings in terms of biobehavioral synchrony, emphasizing how inter-subject neural synchronization may reflect coordinated parent-child engagement and emotional attunement. We have also incorporated insights from neurodevelopmental and neuroconstructivist models, acknowledging that social cognitive development is shaped by dynamic interactions between neural maturation and environmental input over time.

      Although we continue to briefly reference Bandura’s theory to situate our findings within broader social-cognitive frameworks, we have clearly delineated the boundaries of what our data can support and have tempered previous claims. These changes are intended to better align our conceptual framing with the empirical evidence and relevant theoretical models.

      The corresponding changes have been made on pages 11-12 of the revised manuscript.

      “Insights into mechanisms of Neuroconstructivist Perspectives and Bandura’s social learning theory

      Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions49. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture [1] . Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      Although our study was not designed to directly examine learning mechanisms such as imitation or reinforcement, the findings can be viewed as broadly consistent with social learning theory. Bandura's theory posits that human behavior is shaped by observational learning and modeling from others in one's environment [2-4]. According to Bandura, children acquire social cognitive skills by observing and interacting with their parents and other significant figures in their environment. This dynamic interplay shapes their ability to understand and predict the behavior of others, which is crucial for the development of ToM and other social competencies.”

      References

      (1) Hughes, C. et al. Origins of individual differences in theory of mind: From nature to nurture? Child development 76, 356-370 (2005).

      (2) Koole, S. L. & Tschacher, W. Synchrony in psychotherapy: A review and an integrative framework for the therapeutic alliance. Frontiers in psychology 7, 862 (2016).

      (3) Liu, D., Wellman, H. M., Tardif, T. & Sabbagh, M. A. Theory of mind development in Chinese children: a meta-analysis of false-belief understanding across cultures and languages. Developmental Psychology 44, 523 (2008).

      (4) Frith, U. & Frith, C. D. Development and neurophysiology of mentalizing. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 358, 459-473 (2003).

      The significance and impact of the findings would be clearer if the authors more clearly situated the findings in the context of (a) other movie and theory of mind fMRI task data during development; and (b) existing data on parent-child neural synchrony (often uses fNIRS or EEG). What principles of brain and social cognition development do these data speak to? What is new?

      We thank the reviewer for this thoughtful comment. In response, we have revised the Discussion section to more clearly situate our findings within two key literatures: (a) fMRI studies examining Theory of Mind using movie-based and traditional task paradigms across development, and (b) research on parent-child neural synchrony. We now articulate more explicitly how our findings advance current understanding of the neural architecture of social cognition in childhood, and how they contribute new insights into the relational processes shaping brain function. These revisions clarify the conceptual and empirical novelty of our study, particularly in its use of naturalistic fMRI, simultaneous child-parent dyads, and integration of neural maturity with interpersonal synchrony.

      The corresponding changes have been made on pages 12 of the revised manuscript.

      “Our findings contribute to and extend prior research using fMRI paradigms to investigate ToM development in children.  Previous work has shown that these networks become increasingly specialized and differentiated throughout childhood [1-3]. The current study extends these findings by demonstrating that the development of social brain networks is a gradual process that continues beyond the preschool years and is related to children's chronological age. This finding is consistent with behavioral research indicating that ToM and social abilities continue to develop and refine throughout middle childhood and adolescence [4]. Importantly, we move beyond prior work by combining reverse correlation with naturalistic stimuli to isolate discrete, behaviorally meaningful events (e.g., mental state attribution, social rejection) and relate children’s brain responses to adult patterns and social outcomes. This event-level analysis in a dyadic context offers greater ecological and interpretive precision than traditional block or condition-based designs. Our study provides novel evidence for the neural underpinnings of this protracted development, suggesting that the functional maturation of social brain networks may support the continued acquisition and refinement of social cognitive skills.

      In parallel, our study builds on and extends a growing body of work on parent-child neural synchrony, much of which has relied on fNIRS or EEG hyperscanning to demonstrate interpersonal alignment during communication, shared attention, or cooperative tasks [5-7]. While these modalities offer fine temporal resolution, they are limited in spatial precision and typically focus on surface-level cortical regions such as the prefrontal cortex. By contrast, our naturalistic fMRI approach enables the examination of deep and distributed brain networks—specifically those supporting social cognition—within child-parent dyads during emotionally and cognitively rich scenarios. Intriguingly, we found that neural synchronization during movie viewing was higher in child-mother dyads compared to child-stranger dyads.”

      Reference

      (1) Jacoby, N., Bruneau, E., Koster-Hale, J. & Saxe, R. Localizing Pain Matrix and Theory of Mind networks with both verbal and non-verbal stimuli. Neuroimage 126, 39-48 (2016).

      Astington, J. W. & Jenkins, J. M. A longitudinal study of the relation between language and theory-of-mind development. Developmental Psychology 35, 1311 (1999).

      (2) Carter, E. J. & Pelphrey, K. A. School-aged children exhibit domain-specific responses to biological motion. Social Neuroscience 1, 396-411 (2006).

      (3) Cantlon, J. F., Pinel, P., Dehaene, S. & Pelphrey, K. A. Cortical representations of symbols, objects, and faces are pruned back during early childhood. Cerebral Cortex 21, 191-199 (2011).

      (4) Im-Bolter, N., Agostino, A. & Owens-Jaffray, K. Theory of mind in middle childhood and early adolescence: Different from before? Journal of experimental child psychology 149, 98-115 (2016).

      (5) Deng, X. et al. Parental involvement affects parent-adolescents brain-to-brain synchrony when experiencing different emotions together: an EEG-based hyperscanning study. Behavioural brain research 458, 114734 (2024).

      (6) Miller, J. G. et al. Inter-brain synchrony in mother-child dyads during cooperation: an fNIRS hyperscanning study. Neuropsychologia 124, 117-124 (2019).

      (7) Nguyen, T., Bánki, A., Markova, G. & Hoehl, S. Studying parent-child interaction with hyperscanning. Progress in brain research 254, 1-24 (2020).

      There is little discussion about the study limitations, considerations about the generalizability of the findings, and important next steps and future directions. What can the data tell us, and what can it NOT tell us?

      We appreciate the reviewer’s recommendation to elaborate on the study’s limitations, generalizability, and future directions. In response, we have added a dedicated section to the Discussion that critically addresses these considerations. We acknowledge the cross-sectional nature of the study, the modest sample size, and the use of a single stimulus context as key limitations. We also clarify the inferences that can be drawn from our data and what remains speculative. Finally, we outline specific future research directions.

      The corresponding changes have been made on pages 13-14 of the revised manuscript.

      “While leveraging a naturalistic movie-viewing paradigm allowed us to study children's spontaneous neural responses during a semi-structured yet engaging task, dedicated experimental designs are still needed to make stronger inferences about the cognitive processes involved. Additionally, our region-of-interest approach precluded examination of whole-brain networks; future work could explore developmental changes in broader functional circuits. The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa. Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology. Larger and more diverse samples would enhance the generalizability of the findings.

      Several future directions emerge from this research. First, combining naturalistic neuroimaging with structured cognitive tasks could elucidate the specific mental processes underlying children's neural responses during movie viewing. Examining how these processes relate to real-world social behavior would further bridge neurocognitive function and ecological validity. Longitudinal studies beginning in infancy could chart the developmental trajectories of parent-child neural synchrony and their impact on long-term social outcomes. Such work could also explore sensitive periods when parenting may be most influential on social brain maturation. Finally, expanding this multimodal approach to clinical populations like autism could yield insights into atypical social cognitive development and inform tailored intervention strategies targeting parent-child relationships and neural plasticity.”

      To evaluate associations between child neural activity patterns during the movie AND parent-child synchronization patterns AND other variables such as parent-child communication and theory of mind behavior, it seems like a robust approach could be to examine whether similar synchronization patterns are associated with similar scores on different variables. Would allow for non-linear and multivariate associations.

      We greatly appreciate the reviewer’s thoughtful suggestion regarding the use of similarity-based or multivariate analyses to assess whether dyads with similar neural synchronization profiles also exhibit similar scores on behavioral or relational variables. We agree that this type of analysis—such as representational similarity analysis (RSA) or inter-subject pattern similarity—offers a powerful framework for capturing non-linear and multivariate associations, and could provide deeper insights into shared neurobehavioral patterns across participants. However, the analytic logic of similarity-based approaches typically requires the availability of comparable measures across individuals or dyads (e.g., child A and child B must both have measures of brain activity, behavior, and environment). In the present study, our focus was on the child as the behavioral and developmental target, and we did not collect parallel behavioral or cognitive variables from the parent side (e.g., adult Theory of Mind ability, emotional traits, parenting style questionnaires beyond dyadic reports). As a result, it was not feasible to construct pairwise similarity matrices across dyads that include both neural synchrony and matched behavioral dimensions from both individuals.

      Instead, our study was designed to examine how child-level outcomes (e.g., Theory of Mind performance, social functioning) are associated with (a) the child’s neural responses to specific social events, and (b) the degree of neural synchronization with their mother, as a marker of relational engagement. The analytical emphasis, therefore, remained on within-child variation, modulated by the quality of the parent-child interaction.

      Were there associations between parent-child neural synchronization and child age? What was the association between neural maturity and parent-child neural synchronization

      We thank the reviewer for raising this important point regarding associations between parent-child neural synchronization (ISS), child age, and neural maturity.

      As reported in the original manuscript, we did not observe significant correlations between parent-child ISS and child age for either the Theory of Mind (ToM) or Social Pain Matrix (SPM) networks (all ps > 0.1). Additionally, we conducted additional analysis, we found no significant correlations between ISS and neural maturity (Author response image 1, r = 0.2503, p = 0.1533).

      These findings indicate that parent-child neural synchronization in this naturalistic viewing context is not simply explained by age-related maturation or children's neural maturity level. Instead, ISS may predominantly reflect real-time interpersonal engagement or relational dynamics rather than individual developmental trajectories or neural maturity.

      Author response image 1.

      Scatterplot showing the association between parent-child inter-subject synchronization (ISS) and neural maturity, averaged across the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Each point represents one dyad. No significant correlation was observed between ISS and neural maturity (r = 0.2503, p = 0.1533, suggesting that interpersonal neural synchronization and individual neural maturation may reflect dissociable aspects of social brain development.

      The rationale for splitting the ages into 3 groups is unclear and creates small groups that could be more prone to spurious associations. Why not look at age continuously?

      We thank the reviewer for raising this important point. We fully agree that analyzing age as a continuous variable is statistically more robust and minimizes concerns about spurious associations due to arbitrary groupings.

      To clarify, all primary statistical models—including correlational analyses—treated age as a continuous variable, and our core developmental inferences are based on these continuous-age findings.

      In addition to these analyses, we included age group comparisons as a supplementary approach, guided by both theoretical considerations and visual inspection of the data. Specifically, we aimed to explore whether functional differentiation between social brain networks (e.g., ToM and SPM) might begin to emerge non-linearly or earlier than expected, particularly in the youngest children. Such early neural divergence may not be well-captured by linear trends alone. The grouped analysis allowed us to illustrate that network differentiation was already observable in children under age 5, suggesting that certain aspects of social brain organization may emerge earlier than classically assumed.

      We have now clarified this rationale in the revised manuscript and emphasized that the group-based analysis was used solely to highlight developmental shifts that may not follow a linear pattern, and not for formal hypothesis testing.

      The corresponding changes have been made on pages 9 of the revised manuscript.

      “While our primary analyses treated age as a continuous variable, we also performed exploratory group-based comparisons to probe for potential non-linear developmental shifts in social brain network organization. This approach revealed that the differentiation between ToM and SPM networks was already present in the youngest group (ages 3–4), suggesting that early neural specialization may begin prior to the age at which ToM behavior is reliably observed. These group-level observations provide complementary evidence to the continuous analyses and may inform future work examining sensitive periods or early markers of social brain development.”

      Tables would be improved if they were more professionally formatted (e.g., names of the variables rather than variable abbreviation codes).

      We appreciate the reviewer’s suggestion to improve the clarity and professionalism of our tables. In the revised manuscript, we have reformatted all tables to include full variable names rather than abbreviations or coded labels, and we ensured consistency in terminology across the manuscript text, tables, and figure legends. We have also added explanatory footnotes where needed to clarify any derived or composite measures. We hope these revisions improve the accessibility and readability of the results for a broader audience

      Reviewer #2:

      Summary:

      This study investigates the impact of mother-child neural synchronization and the quality of parent-child relationships on the development of Theory of Mind (ToM) and social cognition. Utilizing a naturalistic fMRI movie-viewing paradigm, the authors analyzed inter-subject neural synchronization in mother-child dyads and explored the connections between neural maturity, parental caregiving, and social cognitive outcomes. The findings indicate age-related maturation in ToM and social pain networks, emphasizing the importance of dyadic interactions in shaping ToM performance and social skills, thereby enhancing our understanding of the environmental and intrinsic influences on social cognition.

      Strengths:

      This research addresses a significant question in developmental neuroscience, by linking social brain development with children's behaviors and parenting. It also uses a robust methodology by incorporating neural synchrony measures, naturalistic stimuli, and a substantial sample of mother-child dyads to enhance its ecological validity. Furthermore, the SEM approach provides a nuanced understanding of the developmental pathways associated with Theory of Mind (ToM).

      We appreciate the positive evaluation and valuable comments of the reviewer. According to the reviewer`s comments, we have revised the manuscript thoroughly to address the concerns raised by the reviewer. A point-by-point response to each of the issues raised by the reviewer has been made. We believe that the revision of our manuscript has now been significantly improved.

      Upon reviewing the introduction, I feel that the first goal - developmental changes of the social brain and its relation to age - seems somewhat distinct from the other two goals and the main research question of the manuscript. The authors might consider revising this section to enhance the overall coherence of the manuscript. Additionally, the introduction lacks a clear background and rationale for the importance of examining age-related changes in the social brain.

      We thank the reviewer for this thoughtful observation. In response, we have revised the Introduction to better integrate the developmental aspect of the social brain with the broader research aims. We now explicitly link age-related changes in social brain organization to the emergence of social cognitive abilities and highlight why early childhood (ages 3–8) represents a particularly formative period. This revision clarifies that our first aim—examining functional specialization and neural maturity in Theory of Mind (ToM) and Social Pain Matrix (SPM) networks—serves as a developmental foundation for understanding how dyadic influences, such as neural synchrony and caregiving quality, shape children’s social cognition.

      We have also improved the rationale for examining age-related change, drawing on key literature in developmental neuroscience to show how the early emergence and specialization of social brain networks provide a necessary context for interpreting interpersonal neural dynamics.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “These findings suggest that the development of specialized brain regions for reasoning about others' mental states and physical sensations is a gradual process that continues throughout childhood.

      Understanding how these networks differentiate with age is essential not only for mapping typical brain development, but also for contextualizing the role of environmental influences. By establishing normative patterns of neural maturity and differentiation, we can better interpret how relational experiences—such as caregiver-child synchrony and parenting quality—modulate these trajectories. Thus, our first goal provides a developmental anchor that grounds our investigation of interpersonal and environmental contributions to social brain function.”

      The manuscript uses both "mother-child" and "parent-child" terminology. Does this imply that only mothers participated in the fMRI scans while fathers completed the questionnaires? If so, have the authors considered the potential impact of parental roles (father vs. mother)?

      We thank the reviewer for raising this important point regarding terminology and parental roles. To clarify, all participating caregivers in the current study were biological mothers, and all behavioral questionnaires were also completed by these same mothers. No fathers were included in this study. We have revised the manuscript throughout to consistently use the term “mother-child” when referring to the specific dyads in our sample.

      We also appreciate the opportunity to elaborate on the rationale for including only mothers. Prior research has shown that maternal and paternal influences on child development are not interchangeable, and that the neural correlates of caregiving behaviors differ between mothers and fathers. For example, studies have demonstrated distinct patterns of brain activation during social and emotional processing in mothers versus fathers (Abraham et al., 2014; JE Swain et al., 2014). Given these differences, we deliberately focused on mother-child dyads to maintain neurobiological consistency in our analysis and reduce variance associated with heterogeneous caregiving roles. We now clarify this rationale in the revised Methods and Discussion sections.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “We chose to focus exclusively on mother-child dyads in this study based on prior evidence suggesting distinct neural and behavioral caregiving profiles between mothers and fathers [1-2], allowing us to maintain role consistency and reduce variability in dyadic interactions.

      Our sample was restricted to mother-child dyads, leaving open questions about potential differences in father-child relationships and gender effects on parenting neurobiology [1]. Larger and more diverse samples would enhance the generalizability of the findings.”

      Reference:

      (1) Swain, J. E. et al. Approaching the biology of human parental attachment: Brain imaging, oxytocin and coordinated assessments of mothers and fathers. Brain research 1580, 78-101 (2014).

      (2) Abraham, E. et al. Father's brain is sensitive to childcare experiences. Proceedings of the National Academy of Sciences 111, 9792-9797 (2014).

      There is inconsistent usage of the terms ISC and ISS in the text and figures, both of which appear to refer to synchronization derived from correlation analysis. It would be beneficial to maintain consistency throughout the manuscript.

      We thank the reviewer for highlighting the inconsistent use of “ISC” and “ISS” in the original manuscript. We agree that clarity and consistency in terminology are essential. In response, we have revised the manuscript to consistently use “ISS” (inter-subject synchronization) throughout the text, figures, tables, and legends.

      Of the 50 dyads, 16 were excluded due to data quality issues, which constitutes a significant proportion. It would be helpful to know whether these excluded dyads exhibited any distinctive characteristics. Providing information on demographic or behavioral differences-such as Theory of Mind (ToM) performance and age range between the excluded and included dyads would enhance the assessment of the findings' generalizability.

      We thank the reviewer for this important observation. We agree that understanding the characteristics of excluded participants is essential for assessing the generalizability of the findings.

      In response, we conducted comparative analyses between included and excluded dyads (N = 34 included; N = 16 excluded) on key demographic and behavioral variables, including child age, gender, and Theory of Mind (ToM) performance. These analyses revealed no significant differences between groups on any of these measures (ps > 0.1), suggesting that data exclusion due to quality issues (e.g., excessive motion, incomplete scans) did not introduce systematic bias.

      We have now added this information to the Results and Methods sections of the manuscript.

      The corresponding changes have been made on pages 6 and 17 of the revised manuscript.

      “Of the 50 initial mother-child dyads recruited, 16 were excluded due to excessive head motion (n = 11), incomplete scan sessions (n = 3), or technical issues during data acquisition (n = 2). The final sample consisted of 34 dyads. To assess potential bias introduced by data exclusion, we compared included and excluded dyads on child age, gender, and Theory of Mind performance. No significant differences were found across these variables (all ps > 0.1), suggesting that the analytic sample was demographically representative of the full cohort.

      Comparison between included and excluded dyads revealed no significant differences in child age (t = 1.23, p = 0.24), ToM scores (t = -0.54, p = 0.59), or sex distribution (χ² < 0.01, p = 0.98), indicating that data exclusion did not bias the sample in a systematic way.”

      The article does not adhere to the standard practice of using a resting state as a baseline for subtracting from task synchronization. Is there a rationale for this approach? Not controlling for a baseline may lead to issues, such as whether resting state synchronization already differs between subjects with varying characteristics.

      We thank the reviewer for raising this important methodological point. We agree that controlling for baseline synchronization, such as using a resting-state scan as a comparison, can help disambiguate whether task-induced synchrony reflects genuine stimulus-driven coupling or baseline differences across individuals or dyads.

      In the present study, we focused on inter-subject synchronization (ISS) during naturalistic movie viewing, a task condition that has been widely used in previous developmental and social neuroscience research to assess shared neural engagement. We did not include a resting-state scan in the current protocol due to time constraints and the young age of our participants (ages 3–8), as longer scanning sessions often result in increased motion and reduced data quality in pediatric populations. Moreover, many prior studies using ISS in naturalistic paradigms have similarly focused on task-driven synchrony without subtracting a resting baseline (e.g., Hasson et al., 2004; Nguyen et al., 2020; Reindl et al., 2018).

      That said, we acknowledge that baseline neural synchrony across dyads may vary depending on individual or relational characteristics (e.g., temperament, arousal, attentional style), and this remains an important question for future research. In the revised Discussion, we now explicitly note the absence of a resting-state baseline as a limitation and highlight the need for future studies to examine how resting and task-based ISS may interact, particularly in the context of child-caregiver dyads.

      The corresponding changes have been made on page 13 of the revised manuscript.

      “Another limitation of the current design is the lack of a resting-state baseline for inter-subject synchronization. While our focus was on synchronization during naturalistic social processing, we cannot determine whether individual differences in ISS reflect purely task-induced coupling or are partially shaped by trait-level synchrony present at rest. Including both resting and task conditions in future work would allow for stronger inferences about stimulus-specific versus baseline-driven synchronization, especially in relation to interpersonal factors such as relationship quality or social responsiveness.”

      The title of the manuscript suggests a direct influence of mother-child interactions on children's social brain and theory of mind. However, the use of structural equation modeling (SEM) may not fully establish causal relationships. It is possible that the development of children's social brain and ToM also enhances mother-child neural synchronization. The authors should address this alternative hypothesis of the potential bidirectional relationship in the discussion and exercise caution regarding terms that imply causality in the title and throughout the manuscript.

      We appreciate the reviewer’s careful attention to issues of causality in our manuscript. We agree that our cross-sectional design limits causal inference, and that the use of structural equation modeling (SEM) in this context does not allow for conclusions about directional or mechanistic pathways. In response, we have revised the Discussion to explicitly acknowledge these limitations, and now include an expanded section on the potential for bidirectional or co-constructed processes, consistent with neuroconstructivist frameworks.

      We have also tempered the interpretation of our SEM findings, avoiding causal language throughout the manuscript and clarifying that our analyses are exploratory and associational in nature. We hope that these changes provide a more cautious and developmentally grounded interpretation of the data.

      With regard to the title, we respectfully chose to retain the original wording, as we believe it captures the thematic focus and central research question of the paper—namely, the potential role of mother-child interaction in the development of children’s social brain and Theory of Mind. While we understand the reviewer’s concern, we note that the interpretation of this phrasing is contextualized within the manuscript, which now includes clear qualifications regarding the limits of causal inference. We have taken care to ensure that no claims of unidirectional causality are made in the body of the paper.

      The corresponding changes have been made on pages 11- 12 of the revised manuscript.

      “Our findings align with a neuroconstructivist perspective, which conceptualizes brain development as an emergent outcome of reciprocal interactions between biological constraints and context-specific environmental inputs. Rather than presuming fixed traits or linear maturation, this perspective highlights how neural circuits adaptively organize in response to experience, gradually supporting increasingly complex cognitive functions54. It offers a particularly powerful lens for understanding how early caregiving environments modulate the maturation of social brain networks.

      Building on this framework, the present study reveals that moment-to-moment neural synchrony between parent and child, especially during emotionally salient or socially meaningful moments, is associated with enhanced Theory of Mind performance and reduced dyadic conflict. This suggests that beyond age-dependent neural maturation, dyadic neural coupling may serve as a relational signal, embedding real-time interpersonal dynamics into the child’s developing neural architecture. Our data demonstrate that children’s brains are not merely passively maturing, but are also shaped by the relational texture of their lived experiences—particularly interactions characterized by emotional engagement and joint attention. Importantly, this adds a new dimension to neuroconstructivist theory: it is not simply whether the environment shapes development, but how the quality of interpersonal input dynamically calibrates neural specialization. Interpersonal variation leaves detectable signatures in the brain, and our use of neural synchrony as a dyadic metric illustrates one potential pathway through which caregiving relationships exert formative influence on the developing social brain.

      The contribution of this work lies not in reiterating the interplay of nature and nurture, but in specifying the mechanistic role of interpersonal neural alignment as a real-time, context-sensitive developmental input. Neural synchrony between parent and child may function as a form of relationally grounded, temporally structured experience that tunes the child’s social brain toward contextually relevant signals. Unlike generalized enrichment, this form of neural alignment is inherently personalized and contingent—features that may be especially potent in shaping social cognitive circuits during early childhood.

      The cross-sectional nature of our study is a further limitation, as it cannot definitively establish the causal directions of the observed relationships. Longitudinal designs tracking children's brain development and social cognitive abilities over time would help clarify whether early parenting impacts later neural maturation and behavioral outcomes, or vice versa.”

      I would appreciate more details about the 14 Theory of Mind (ToM) tasks, which could be included in supplemental materials. The authors score them on a scale from 0 to 14 (each task 1 point); however, the tasks likely vary in difficulty and should carry different weights in the total score (for example, the test and the control questions should have different weights). Many studies have utilized the seven tasks according to Wellman and Liu (2004), categorizing them into "basic ToM" and "advanced ToM." Different components of ToM could influence the findings of the current study, which should be further examined by a more in-depth analysis.

      We thank the reviewer for raising this important point regarding the structure and scoring of the Theory of Mind (ToM) tasks. We will provide a detailed description of all 14 tasks in the Supplemental Materials, including their content, targeted mental state concepts (e.g., beliefs, desires, intentions), and design features (e.g., test/control items, task format).

      We fully agree that ToM tasks differ in complexity, and in principle, a weighted or component-based scoring approach (e.g., distinguishing basic and advanced ToM) could offer greater interpretive value. However, in our study design, tasks were administered in a fixed sequence from lower to higher difficulty, and testing was terminated if the child was unable to successfully complete three consecutive tasks. This approach is developmentally appropriate for younger children but results in non-random missingness for more advanced tasks—particularly among children at the lower end of the age range (3–4 years).

      Given this adaptive task structure, re-scoring using weighted or subscale-based approaches would introduce systematic bias, as children who struggled with early items were not administered more complex ones. As a result, a full breakdown by task type (e.g., basic vs. advanced ToM) would only reflect a restricted subsample and would not be comparable across the full cohort. For this reason, we retained the unit-weighted total ToM score as the most developmentally valid and comparable metric across participants.

      Reviewer #3:

      Summary:

      The article explores the role of mother-child interactions in the development of children's social cognition, focusing on Theory of Mind (ToM) and Social Pain Matrix (SPM) networks. Using a naturalistic fMRI paradigm involving movie viewing, the study examines relationships among children's neural development, mother-child neural synchronization, and interaction quality. The authors identified a developmental pattern in these networks, showing that they become more functionally distinct with age. Additionally, they found stronger neural synchronization between child-mother pairs compared to child-stranger pairs, with this synchronization and neural maturation of the networks associated with the mother-child relationship and parenting quality.

      Strengths:

      This is a well-written paper, and using dyadic fMRI and naturalistic stimuli enhances its ecological validity, providing valuable insights into the dynamic interplay between brain development and social interactions. However, I have some concerns regarding the analysis and interpretation of the findings. I have outlined these concerns below in the order they appear in the manuscript, which I hope will be helpful for the revision.

      We appreciate the reviewer’s thoughtful and constructive summary of the manuscript. The concerns raised regarding aspects of the analysis and interpretation have been carefully considered. Detailed point-by-point responses are provided below, along with descriptions of the corresponding revisions made to improve the clarity, precision, and interpretive caution of the manuscript.

      Given the importance of social cognition in this study, please cite a foundational empirical or review paper on social cognition to support its definition. The current first citation is primarily related to ASD research, which may not fully capture the broader context of social cognition development.

      We thank the reviewer for this helpful suggestion. We agree that a broader, foundational reference is more appropriate for introducing the concept of social cognition. In response, we have revised the Introduction to include a widely cited theoretical or review paper on social cognition to provide a more general developmental context.

      The corresponding changes have been made on pages 3 of the revised manuscript.

      “Social cognition, defined as the ability to interpret and predict others' behavior based on their beliefs and intentions and to interact in complex social environments and relationships is a crucial aspect of human development [1-2]”

      (1) Adolphs, R. The social brain: neural basis of social knowledge. Annual review of psychology 60, 693-716 (2009).

      (2) Frith, C. D. & Frith, U. Mechanisms of social cognition. Annual review of psychology 63, 287-313 (2012).

      It is standard practice to report the final sample size in the Abstract and Introduction, rather than the initial recruited sample, as high attrition rates are common in pediatric studies. For example, this study recruited 50 mother-child dyads, and only 34 remained after quality control. This information is crucial for interpreting the results and conclusions. I recommend reporting the final sample size in the abstract and introduction but specifying in the Methods that an additional 16 mother-child dyads were initially recruited or that 50 dyads were originally collected.

      We thank the reviewer for this helpful recommendation. In the original version of the manuscript, the Abstract and Introduction referenced the total number of dyads recruited (N = 50). In line with standard reporting practices and to ensure clarity regarding the analytic sample, we have now revised both the Abstract and Introduction to report the final sample size (N = 34). The full recruitment and exclusion details—including the number of dyads removed due to excessive motion or technical issues—are now clearly described in the Methods section.

      The corresponding changes have been made on pages 1 and 4 of the revised manuscript.

      In the "Neural maturity reflects the development of the social brain" section, the authors report the across-network correlation for adults, finding a negative correlation between ToM and SPM. However, the cross-network correlations for the three child groups are not reported. The statement that "the two networks were already functionally distinct in the youngest group of children we tested" is based solely on within-network positive correlations, which does not fully demonstrate functional distinctness. Including cross-network correlations for the child groups would strengthen this conclusion.

      We thank the reviewer for this insightful comment. We agree that within-network correlations alone do not fully establish functional distinctness, particularly in early development. To more directly test whether the ToM and SPM networks were already differentiated in children, we have now included the cross-network correlations between the two networks for each of the three age groups in the revised manuscript. These findings support and strengthen our original claim that the ToM and SPM networks are functionally dissociable even in early childhood, and we have revised the relevant Results sections accordingly to reflect this.

      The corresponding changes have been made on page 7 of the revised manuscript.

      “In children, each network also exhibited positive correlations within-network and negative correlations across networks (within-ToM correlation M(s.e.) = 0.31(0.04); within-SPM correlation M(s.e.) = 0.29(0.04); across-network M(s.e.) = −0.09 (0.02).

      In the Pre-junior group only (3-4 years old children, n = 12), both ToM and SPM networks had positive within-network correlations (within-ToM correlation M (s.e.) = 0.29(0.06); within-SPM correlation M(s.e.) = 0.23(0.05), across-network M(s.e.) = −0.05(0.02)).”

      The ROIs for the ToM and SPM networks are defined based on previous literature, applying the same ROIs across all age groups. While I understand this is a common approach, it's important to note that this assumption may not fully hold, as network architecture can evolve with age. The functional ROIs or components of a network might shift, with regions potentially joining or exiting a network or changing in size as children develop. For instance, Mark H. Johnson's interactive specialization theory suggests that network composition may adapt over developmental stages. Although the authors follow the approach of Richardson et al. (2018), it would be beneficial to discuss this limitation in the Discussion. An alternative approach would be to apply data-driven analysis to justify the selection of the ROIs for the two networks.

      We thank the reviewer for this thoughtful and theoretically grounded comment.  In our study, we followed the approach of Richardson et al. (2018), using a priori ROIs defined from adult meta-analyses and ToM/SPM task studies. This approach facilitates comparison with prior work and provides anatomical consistency across participants. However, we fully agree that applying adult-defined ROIs to pediatric populations involves important assumptions about the stability of network architecture across development, which may not fully hold in early childhood.

      We have now addressed this limitation more explicitly in the revised Discussion, emphasizing that the fixed-ROI approach may not capture the dynamic reorganization of social brain networks during development.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “Moreover, the ROIs used to define the ToM and SPM networks were based on meta-analyses and task studies primarily conducted with adults. While this approach promotes comparability with existing literature, it assumes that the spatial organization of these networks is stable across age groups. However, theories of interactive specialization suggest that the composition and boundaries of functional networks may undergo reorganization during development, with regions potentially entering or exiting networks based on experience and maturational processes. As a result, the current analysis may not fully capture age-specific functional architecture, particularly in younger children. Future studies using data-driven or age-appropriate parcellation methods could provide more precise characterizations of how social brain networks are constructed and differentiated throughout childhood.”

      The current sample size (N = 34 dyads) is a limitation, particularly given the use of SEM, which generally requires larger samples for stable results. Although the model fit appears adequate, this does not guarantee reliability with the current sample size. I suggest discussing this limitation in more detail in the Discussion.

      We thank the reviewer for highlighting the limitations of applying structural equation modeling (SEM) with a relatively modest sample size. We agree that SEM generally benefits from larger samples to ensure model stability and parameter reliability, and that satisfactory model fit does not guarantee robustness in small-sample contexts.

      In the revised Discussion, we now more clearly acknowledge that the use of SEM in the current study is exploratory in nature, and that all results should be interpreted with caution due to potential sample size-related constraints. The model was constructed to provide an integrated view of the observed associations rather than to establish definitive pathways. We have also added a note that future research with larger samples and longitudinal designs will be needed to validate and extend the proposed model.

      The corresponding changes have been made on pages 13 of the revised manuscript.

      “In addition, the modest sample size (N = 34 dyads) presents limitations for the application of structural equation modeling (SEM), which typically requires larger samples for stable estimation and generalizable inferences. While the model fit was acceptable, the results should be interpreted as exploratory and hypothesis-generating, rather than confirmatory. Future studies with larger, independent samples will be important for validating the structure and directionality of the proposed relationships”

      Based on the above comment, I believe that conclusions regarding the relationship between social network development, parenting, and support for Bandura's theory should be tempered. The current conclusions may be too strong given the study's limitations.

      We thank the reviewer for this important and balanced observation. We agree that the conclusions drawn from the current study should reflect the exploratory nature of the analyses, as well as the methodological limitations, including the modest sample size and cross-sectional design.

      In response, we have revised the Conclusion sections to use more cautious, associative language when describing the observed relationships among social brain development, parenting factors, and Theory of Mind outcomes. In particular, we have tempered statements regarding support for Bandura’s social learning theory, clarifying that while our findings are consistent with social learning frameworks, the data do not allow for direct tests of modeling or observational learning mechanisms.

      We hope these revisions help clarify the scope of the findings and improve the conceptual rigor of the manuscript.

      The corresponding changes have been made on pages 14 of the revised manuscript.

      “Our study provides novel evidence that children's social cognitive development may be shaped by the intricate interplay between environmental influences, such as parenting, and biological factors, such as neural maturation. Our findings contribute to a growing understanding of the factors associated with social cognitive development and suggest the potential importance of parenting in this process. Specifically, the study points to the possible role of the parent-child relationship in supporting the development of social brain circuitry and highlights the relevance of family-based approaches for addressing social difficulties. The observed neural synchronization between parent and child, which was associated with relationship quality, underscores the potential significance of positive parental engagement in fostering social cognitive skills. Future longitudinal and clinical research can build on this multimodal approach to further clarify the neurobehavioral mechanisms underlying social cognitive development. Such research may help inform more effective strategies for promoting healthy social functioning and mitigating social deficits through targeted family-based interventions.”

      The SPM (pain) network is associated with empathic abilities, also an important aspect of social skills. It would be relevant to explore whether (or explain why) SPM development and child-mother synchronization are (or are not) related to parenting and the parent-child relationship.

      We thank the reviewer for this thoughtful and important comment regarding the role of the Social Pain Matrix (SPM) network in social cognition and empathy. We agree that this network represents a critical component of social-cognitive development and is theoretically linked to affective processing and interpersonal understanding.

      We would like to clarify that in our existing analyses—already included in the original submission and detailed in the Supplemental Results—SPM network measures showed similar significant associations with behavioral outcomes than the ToM network. These outcomes included children's performance on ToM tasks as well as broader measures of social functioning. We have added more discussion in the supplementary results.

      “To further investigate the specificity of our findings, we conducted additional control analyses focusing on the individual components of the social brain networks examined in our study: the Theory of Mind (ToM) and Social Pain Matrix (SPM) networks.

      When analyzing these networks separately, we found significant correlations between neural maturity and age, as well as between inter-subject synchronization (ISS) and parent-child relationship quality for both the ToM and SPM networks individually (Fig. S1). Specifically, neural maturity within each network was positively correlated with age, indicating that both networks undergo maturation during childhood. Similarly, ISS within each network was negatively correlated with parent-child conflict scores, suggesting that both networks contribute to the observed relationship between neural synchrony and parent-child relationship quality.

      These results highlight the importance of considering the social brain as an integrated system, where the ToM and SPM networks work in concert to support social cognitive development. While each network shows age-related maturation and sensitivity to parent-child relationship quality, their combined functioning appears to be crucial for predicting broader social cognitive outcomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In their paper, Kang et al. investigate rigidity sensing in amoeboid cells, showing that, despite their lack of proper focal adhesions, amoeboid migration of single cells is impacted by substrate rigidity. In fact, many different amoeboid cell types can durotax, meaning that they preferentially move towards the stiffer side of a rigidity gradient.

      The authors observed that NMIIA is required for durotaxis and, building on this observation, they generated a model to explain how durotaxis could be achieved in the absence of strong adhesions. According to the model, substrate stiffness alters the diffusion rate of NMAII, with softer substrates allowing for faster diffusion. This allows for NMAII accumulation at the back, which, in turn, results in durotaxis.

      The experiments support the main message of the paper regarding durotaxis by amoeboid cells. In my opinion, a few clarifications on the mechanism proposed to explain this phenomenon could strengthen this research:

      (1) According to your model, the rear end of the cell, which is in contact with softer substrates, will have slower diffusion rates of MNIIA. Does this mean that bigger cells will durotax better than smaller cells because the stiffness difference between front and rear is higher? Is it conceivable to attenuate the slope of the durotactic gradient to a degree where smaller cells lose their ability to durotact, while longer cells retain their capacity for directional movement?

      We thank the reviewer for this comment. In fact, it is not always the case that bigger cells will durotax better than smaller cells. Although bigger cells will sense higher stiffness difference between the front and rear, cells placed on different regions of underlying substrates may respond differently. This is because diffusion coefficient difference is not proportional to stiffness difference in our theoretical model. Therefore, when cells are placed on a very stiff substrate, cells may not durotax. When cells are placed on a region with suitable stiffness, where cells are sensitive to stiffness gradient, bigger cells will durotax better than smaller cells. In this situation, as you mentioned, lowering the stiffness gradient will make smaller cells become adurotactic while longer cells still durotax.

      We tried to further address this question by our durotaxis assay but there was a challenge: the amoeboid cells we use, including CD4+ Naïve T cells, neutrophils, dHL-60 cells and Dictysotelium, frequently protrude, retract and alter contact area with the substrate which make it difficult for us to distinguish between bigger and smaller cells in a particular cell type. Previously reported durotactic cell lines, such as MDA-MB-231 and HT1080 cells, are bigger than the amoeboid cells we use but they are mesenchymal cells and adopt distinct mechanisms which always involve stable focal adhesions. Due to this, although we are eager to answer this question by experiments and that the stiffness gradient is tunable in our system, we have not found an appropriate approach and experimental setup.

      (2) Where did you place the threshold for soft, middle, and stiff regions (Figure 6)? Is it possible that you only have a linear rigidity gradient in the center of your gel and the more you approach the borders, the flatter the gradient gets? In this case, cells would migrate randomly on uniform substrates. Did you perform AFM over the whole length of the gel or just in the central part?

      We thank the reviewer for this comment. We have performed AFM over the whole length of our gradient gel (Fig. S1A). We divide the gel into three equal parts (stiff: 1-4 mm; middle: 4-7 mm; soft: 7-10 mm) and the stiffness gradient is almost linear within each part as shown in Fig. S1A.

      (3) In which region (soft, middle, stiff) did you perform all the cell tracking of the previous figures?

      We thank the reviewer for this question. We performed the cell tracking in the soft region of the gradient gel.

      (4) What is the level of confinement experienced by the cells? Is it possible that cells on the soft side of the gels experience less confinement due to a "spring effect" whereby the coverslips descending onto the cells might exert diminished pressure because the soft hydrogels act as buffers, akin to springs? If this were the case, cells could migrate following a confinement gradient.

      We thank the reviewer for this comment. Although the possibility that our thin hydrogel layers act as buffers cannot be completely excluded, we have performed the durotaxis assay without upper gradient gel providing confinement (Author response image 1A). In this case, CD4+ Naïve T cells, neutrophils, dHL-60 cells and Dictysotelium can still durotax (Author response image 1B-E), indicating stiffness gradient itself is sufficient to direct amoeboid cell migration.

      Author response image 1.

      Illustration of the durotaxis system without confinement (A) and y-FMI of CD4+ Naïve T cells (B), neutrophils (C), dHL-60 cells (D) and Dictysotelium (E) cultured on uniform substrate or gradient substrate (n ≥ 30 tracks were analyzed for each experiment, N = 3 independent experiments for each condition, replicates are biological). All error bars are SEM. ****, P < 0.0001, by Student’s t-test.

      Reviewer #2 (Public Review):

      Summary:

      The authors developed an imaging-based device that provides both spatialconfinement and stiffness gradient to investigate if and how amoeboid cells, including T cells, neutrophils, and Dictyostelium, can durotax. Furthermore, the authors showed that the mechanism for the directional migration of T cells and neutrophils depends on non-muscle myosin IIA (NMIIA) polarized towards the soft-matrix-side. Finally, they developed a mathematical model of an active gel that captures the behavior of the cells described in vitro.

      Strengths:

      The topic is intriguing as durotaxis is essentially thought to be a direct consequence of mechanosensing at focal adhesions. To the best of my knowledge, this is the first report on amoeboid cells that do not depend on FAs to exert durotaxis. The authors developed an imaging-based durotaxis device that provides both spatial confinement and stiffness gradient and they also utilized several techniques such as quantitative fluorescent speckle microscopy and expansion microscopy. The results of this study have well-designed control experiments and are therefore convincing.

      Weaknesses:

      Overall this study is well performed but there are still some minor issues I recommend the authors address:

      (1) When using NMIIA/NMIIB knockdown cell lines to distinguish the role of NMIIA and NMIIB in amoeboid durotaxis, it would be better if the authors took compensatory effects into account.

      We thank the reviewer for this suggestion. We have investigated the compensation of myosin in NMIIA and NMIIB KD HL-60 cells using Western blot and added this result in our updated manuscript (Fig. S4B, C). The results showed that the level of NMIIB protein in NMIIA KD cells doubled while there was no compensatory upregulation of NMIIA in NMIIB KD cells. This is consistent with our conclusion that NMIIA rather than NMIIB is responsible for amoeboid durotaxis since in NMIIA KD cells, compensatory upregulation of NMIIB did not rescue the durotaxis-deficient phenotype.

      (2) The expansion microscopy assay is not clearly described and some details are missed such as how the assay is performed on cells under confinement.

      We thank the reviewer for this comment. We have updated details of the expansion microscopy assay in our revised manuscript in line 481-485 including how the assay is performed on cells under confinement:

      Briefly, CD4+ Naïve T cells were seeded on a gradient PA gel with another upper gel providing confinement. 4% PFA was used to fix cells for 15 min at room temperature. After fixation, the upper gradient PA gel is carefully removed and the bottom gradient PA gel with seeded cells were immersed in an anchoring solution containing 1% acrylamide and 0.7% formaldehyde (Sigma, F8775) for 5 h at 37 °C.

      (3) In this study, an active gel model was employed to capture experimental observations. Previously, some active nematic models were also considered to describe cell migration, which is controlled by filament contraction. I suggest the authors provide a short discussion on the comparison between the present theory and those prior models.

      We thank the reviewer for this suggestion. Active nematic models have been employed to recapitulate many phenomena during cell migration (Nat Commun., 2018, doi: 10.1038/s41467-018-05666-8.). The active nematic model describes the motion of cells using the orientation field, Q, and the velocity field, u. The director field n with (n = −n) is employed to represent the nematic state, which has head-tail symmetry. However, in our experiments, actin filaments are obviously polarized, which polymerize and flow towards the direction of cell migration. Therefore, we choose active gel model which describes polarized actin field during cell migration. In the discussion part, we have provided the comparison between active gel model and motor-clutch model. We have also supplemented a short discussion between the present model and active nematic model in the main text of line 345-347:

      The active nematic model employs active extensile or contractile agents to push or pull the fluid along their elongation axis to simulate cells flowing (61).

      (4) In the present model, actin flow contributes to cell migration while myosin distribution determines cell polarity. How does this model couple actin and myosin together?

      We thank the reviewer for this question. In our model, the polarization field P(r,t) is employed to couple actin and myosin together. It is obvious that actin accumulate at the front while myosin diffuses in the opposite direction. Therefore, we propose that actin and myosin flow towards the opposite direction, which is captured in the convection term of actin (∇[c(v+wP)])  and myosin (∇[m(-wP)]) density field.

      Reviewing Editor (Recommendations For The Authors):

      We suggest that you cite the publication about confinement force microscopy from the Betz lab (https://doi.org/10.1101/2023.08.22.554088).

      We thank the editor for this suggestion. We have cited this publication in line 89 in our updated manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Minor points and text corrections:

      - In line 288 you state that NMIIA basal diffusion rate is larger on softer substrates, while in line 315 you say that NMIIA is more diffusive on stiff. The two sentences seem to contradict each other.

      We thank the reviewer for pointing out this mistake. In our active gel model, the basal diffusion rate of NMIIA is larger on stiffer substrate. We have corrected this mistake in line 288 (line 283 in the updated manuscript) in our revised manuscript.

      - How were the non-muscle myosin images (Figure 3F) collected?

      We thank the reviewer for this question. The non-muscle myosin images in Fig. 3F are single planes collected by epifluorescence-confocal microscopy. We have updated the related method in our revised manuscript in line 477-478:

      After mounting medium is solidified, single plane images were captured using a 63×1.4 NA objective lens on Andor Dragonfly epi-fluorescence confocal imaging system.

      - Is there a quantification of NMAII accumulation at the back?

      We thank the reviewer for this question. We have a quantification of NMIIA distribution in Fig. 3G. We measured the fluorescence intensity of NMIIA and NMIIB in the soft and stiff region of cells and found that the soft/stiff fluorescence ratio of NMIIB is about 0.95 and the ratio of NMIIA is about 1.82, indicating NMIIA tend to be localized at back while NMIIB is evenly distributed in the soft and stiff region of cells.

      - At which frequency were images acquired for Fluorescent Speckle Microscopy? Overall, I think it would help to state the length and frequency of videos in the legends.

      We thank the reviewer for this comment. We have updated the length (10 min for movie 6-10 and 80 sec for movie11) and frequency (15 sec intervals for movie 6-10 and 2 sec intervals for movie11) of Fluorescent Speckle Microscopy videos in our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The cell contour of Figure S5C is not very clear.

      We thank the reviewer for this comment. We have marked the outline of the cell in Fig. S5C in our updated manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early-onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in night-time sleep loss, consistent with the well supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in night-time sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are night-time sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause night-time sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–supplement 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts (Author response image 1):

      – One indication is shared by 4/7 knockouts: “opioid dependence” (significant for appa/appb, psen1, apoea/apoeb, cd2ap).

      – Four targets are shared by 4/7 knockouts: “strychnine-binding glycine receptor” (psen1, apoea/apoeb, clu, sorl1); “neuronal acetylcholine receptor beta-2” (psen1, apoea/apoeb, cd2ap, clu); thyroid peroxidase (psen1, apoea/apoeb, cd2ap, clu); carbonic anhydrase IV (appa/appb, psen1, psen2, cd2ap).

      – Three KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1); tyrosine metabolism (psen2, apoea/apoeb, cd2ap, clu, sorl1); and “nitrogen metabolism” (appa/appb, psen1, psen2, apoea/apoeb, cd2ap).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. Indication “depression” is only significant for sorl1 knockouts; target “serotonin transporter” is also significant for appa/appb and psen2 knockouts; and KEGG pathway “serotonergic synapse” is also significant for psen2 knockouts. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on AD. For example, the first four drugs ever approved by the FDA to treat AD were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in AD.

      Author response image 1.

      Common predictions from ZOLTAR for the seven Alzheimer’s risk genes tested. Predictions from ZOLTAR which are shared by multiple knockout behavioural fingerprints presented in the study. Only indications, targets, and KEGG pathways which are significant for at least three of the seven knockouts tested are shown, ranked from the annotations which are significant for the largest number of knockouts.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (MyersTurnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR may seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      While working on the Author Response, we made some changes to the analysis ran by ZOLTAR to calculate enrichments (see Methods and github.com/francoiskroll/ZOLTAR, notes on v2). With the new version, 5-HT receptor type 2 is not a significantly enriched target for the sorl1 knockout fingerprint but type 4 is. 5-HT receptor type 4 was also shown to interact with sorting nexin 27, a subunit of retromer, so is a promising candidate (Joubert et al., 2004). Antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested. In our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Note, all the results presented in the “Version of Records” are from ZOLTAR v2.

      Despite these important considerations, this study provides a valuable platform for highthroughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that night-time sleep is disrupted in mutants for multiple late onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere inbetween: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, assuming the secondary effects are tolerable, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature (Ashlin et al., 2018; Hoffman et al., 2016). The example from Hoffman et al. (2016) is especially convincing. Drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint were enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram and fluvoxamine treatments (Fig. 5c,d and Fig. 5–supplement 1c,d), cntnap2a/cntnap2b knockout larvae were overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae were found to have fewer GABAergic neurons in the forebrain. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      On your last point, we hope our experiment testing fluvoxamine, another selective serotonin reuptake inhibitor (SSRI), makes the connection between Sorl1 and serotonin signalling more convincing.

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 KO larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose, regardless of being driven by a subset of the animals. The result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 KO + 10 µM larvae, but this amounts to excluding 20 (3/14) or 35 (5/14) % of the datapoints as potential outliers, which is unreasonable. In fact, excluding the top 5 sorl1 KO + 10 µM is equivalent to calling any datapoint with z-score > 0.2 an outlier (z-scores of the top 5 datapoints are 0.2–1.8). Applying consistently the same criterion to the scrambled + 10 µM group would remove the top 6 datapoints (z-scores = 0.5–3.9). Comparing the resulting two distributions again gives the sorl1 KO + 10 µM distribution as significantly higher (p = 0.0015). We would also mention that Euclidean distance, as a summary metric for distance between behavioural fingerprints, has limitations. For example, the measure will be more sensitive to changes in some parameters but not others, depending on how much room there is for a given parameter to change. We included this metric to lend support to the observation one can draw from the fingerprint plot (Fig. 5c) that sorl1 mutants respond in an exaggerated way to citalopram across many parameters, while being agnostic to which parameter might matter most.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relied on this result being robust. As you and Reviewer #3 suggested, we repeated this experiment with a different SSRI, fluvoxamine (Fig. 5–supplement 1). We cannot readily explain why the result was opposite to what we found with citalopram, but in both cases sorl1 knockout larvae reacted differently than their control siblings, which adds an argument to our claim that ZOLTAR correctly predicted serotonin signalling as a disrupted pathway from the behavioural fingerprint. Accordingly, we mostly kept the Discussion on Sorl1 the same, although we concede that we may not have identified the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we adapted the wording to make it clear this is one possible alternative hypothesis which could be tested in the future. The small differences found by HCR are actually more in line with the new results from the fluvoxamine experiment, so it may also be that both hypotheses (pre-synaptic neurons releasing less serotonin when reuptake is blocked; or post-synaptic neurons being less sensitive) contribute. The fluvoxamine experiment was performed in a different lab (ICM, Paris; all other experiments were done in UCL, London) in a different wild-type strain (TL in ICM, AB x Tup LF in UCL), which complicates how one interprets this discrepancy.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutchto-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more detail on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout of psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of AD.

      Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae.

      We added a sentence in Results (section psen2 knockouts […]) to briefly justify our appa/appb knockout approach. To be clear, we do not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed. A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–supplement 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we did not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We added a few sentences in the legend of Fig. 2–supplement 1.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant.  

      b. psen2_KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null, given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–supplement 3a). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer #2, we are planning to assemble a paper that expressly compares behavioural phenotypes measured in F0 vs. stable mutants to allay some of these concerns. Our current manuscript, which combines CRISPR-Cas9 rapid F0 screening with in silico pharmacological predictions, inevitability represents a first step in characterizing the functions of these genes. 

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is substantial clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e. from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g. pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings (Joo et al., 2021), as the clutchto-clutch variability we and others observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but likely decreases the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four experiments (five to six clutches) as sorl1 knockout larvae were also tracked in the citalopram and fluvoxamine experiments (Fig. 5 and Fig. 5–supplement 1), and psen2 knockout larvae were also tracked in the small molecule rescue experiment (Fig. 6 and Fig. 6–supplement 1).

      The psen2 behavioural phenotype replicated well across the six clutches tested (pairwise cosine similarities: 0.62 ± 0.15; Author response image 2a). 5/6 clutches were less active and initiating more sleep bouts during the day, as we claimed in Fig. 3.

      In the citalopram experiment, the H<sub>2</sub>O-treated sorl1 knockout fingerprint replicated fairly well the baseline recordings in Fig. 4, despite the smaller sample size (cos = 0.30 and 0.78; Author response image 2b, see “KO Fig. 5”). 5/6 of the significant parameters presented in Fig. 4–supplement 4 moved in the same direction, and knockout larvae were also hypoactive during the day but hyperactive at night. Note that two clutches were tracked on the same 96-well plate in this experiment. We calculated each larva’s z-score using the average of its control siblings, then we averaged all the z-scores to generate the fingerprint. The H<sub>2</sub>O treated sorl1 knockout clutch from the fluvoxamine experiment did not replicate well the baseline recordings (cos = 0.08 and 0.11; Author response image 2b, see “KO Fig. 5–suppl. 1”). Knockout larvae were hypoactive during the day as expected, but behaviour at night was not as robustly affected. As mentioned above, knockouts were made in a different genetic background (TL, instead of AB x Tup LF used for all other experiments), which could explain the discrepancy.

      We also took the opportunity to check whether our SSRI treatments replicated well the data from Rihel et al., 2010. For both citalopram (n = 3 fingerprints in the database) and fluvoxamine (n = 4 fingerprints in the database), replication was excellent (cos ≥ 0.67 for all comparisons of a fingerprint from this study vs. a fingerprint from Rihel et al. 2010; Author response image 2c,d). Note that the scrambled + 10 µM citalopram and + 10 µM fluvoxamine fingerprints correlate extremely well (cos = 0.92; can be seen in Author response image 2c,d), which was predicted by the small molecule screen dataset.

      Author response image 2.

      Replication of psen2 and sorl1 F0 knockout fingerprints and SSRI treatments from Rihel et al., 2010. a, (left) Every psen2 F0 knockout behavioural fingerprint generated in this study. Each dot represents the mean deviation from the same-clutch scrambled-injected mean for that parameter (z-score, mean ± SEM). From the experiments in Fig. 6, presented is the psen2 F0 knockout + H<sub>2</sub>O fingerprints. The fingerprints in grey (“not shown”) are from a preliminary drug treatment experiment we did not include in the final study. These fingerprints are from psen2 F0 knockout larvae treated with 0.2% DMSO, normalised to scrambled-injected siblings also treated with 0.2% DMSO. (right) Pairwise cosine similarities (−1.0–1.0) for the fingerprints presented. b, Every sorl1 F0 knockout behavioural fingerprint, as in a). c, The scrambled-injected + citalopram (10 µM) fingerprints (grey) in comparison to the citalopram (10–15 µM) fingerprints from the Rihel et al., 2010 database (green). d, The scrambled-injected + fluvoxamine (10 µM) fingerprint (grey) in comparison to the fluvoxamine fingerprints from the Rihel et al., 2010 database (pink). In c) and d), the scrambled-injected fingerprints are from the experiments in Fig. 5 and Fig. 5–suppl. 1, but were converted here into the behavioural parameters used by Rihel et al., 2010 for comparison. Parameters: 1, average activity (sec active/min); 2, average waking activity (sec active/min, excluding inactive minutes); 3, total sleep (hr); 4, number of sleep bouts; 5, sleep bout length (min); 6, sleep latency (min until first sleep bout).

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a prejy minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim (“early-life dysfunction is related to future AD”) from expression alone. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,” which does not mention future risk of AD. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seemed reasonable. Nevertheless, we adapted the wording to address this point and Reviewer #2’s complaint below. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in children at high genetic risk of AD (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      Please also see our answer to a similar point raised by Reviewer #2 below (cf. Author response image 7).

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but our choice of labelling can fairly be challenged.

      To address “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we looked at the larvae’s behaviour immediately after lights abruptly switched on in the mornings. Almost every larva, regardless of genotype, responded strongly to every lights-off transition during the experiment. Instead, we chose the lights-on transition for this analysis because it is a weaker startling stimulus for the larvae than the lights-off transition (Fig. 3–supplement 3), potentially exposing differences between genotypes or behavioural states (quiescent or awake). We defined a larva as having reacted to the lights switching on if it made a swimming bout during the second (25 frames) a er the lights-on transition. Across two clutches and two lights-on transitions, an average of 65% (range 52–73%) of all larvae reacted to the stimulus. psen2 knockout larvae were similarly likely, if not more likely, to respond (in average 69% responded, range 60–76%) than controls (60% average, range 44– 75%). When the lights switched on, about half of the larvae (39–51%) would have been classified as asleep according to the one-minute inactivity definition (i.e. the larva did not move in the minute preceding the lights transition). This allowed us to also compare behavioural states, as suggested by the reviewer. For three of the four light transitions, larvae which were awake when lights switched on were more likely to react than asleep larvae, but this difference was not striking (overall, awake larvae were only 1.1× more likely to react; Author response image 3). Awake psen2 knockout larvae were 1.1× (range 1.04–1.11×) more likely to react than awake control larvae, so, yes, psen2 knockout larvae respond normally when awake. Asleep psen2 knockout larvae were 1.4× (range 0.63–2.19×) more likely to react than asleep control larvae, so psen2 knockouts are also more or equally likely to react than control larvae when asleep. In summary, the overall health of psen2 knockouts did not seem to be a significant confound in the experiment. As the reviewer suggested, if psen2 knockout larvae were seriously unhealthy, they would not be as responsive as control larvae to a startling stimulus.

      Author response image 3.

      psen2 F0 knockouts react normally to lights switching on, indicating they are largely healthy. At each lights-on transition (9 AM), each larva was categorised as awake if it had moved in the preceding one minute or asleep if it had been inactive for at least one minute. Darker tiles represent larvae which performed a swimming bout during the second following lights-on; lighter tiles represent larvae which did not move during that second. The total count of each waffle plot was normalised to 25 so plots can be compared to each other. The real count is indicated in the corner of each plot. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Next, we compared inactive period durations during the day between psen2 and control larvae. If psen2 knockout larvae indeed sleep more during the day compared to controls, we may predict inactive periods longer than one minute to increase disproportionately compared to the increase in shorter inactive periods. This broadly appeared to be the case, especially for one of the two clutches (Author response image 4). In clutch 1, inactive periods lasting 1–60 sec were equally frequent in both psen2 and control larvae (fold change 1.0× during both days), while inactive periods lasting 1–2 min were 1.5× (day 1) and 2.5× (day 2) more frequent in psen2 larvae compared to control larvae. In clutch 2, 1–60 sec inactive periods were also equally frequent in both psen2 and control larvae, while inactive periods lasting 1–2 min were 3.4× (day 1) and 1.5× (day 2) more frequent in psen2 larvae compared to control larvae. Therefore, psen2 knockouts disproportionately increased the frequency of inactive periods longer than one minute, suggesting they genuinely slept more during the day.

      Author response image 4.

      psen2 F0 knockouts increased preferentially the frequency of longer inactive bouts. For each day and clutch, we calculated the mean distribution of inactive bout lengths across larvae of same genotype (psen2 F0 knockout or scrambled-injected), then compared the frequency of inactive bouts of different lengths between the two genotypes. For example, in clutch 1 during day 2, 0.01% of the average scrambled-injected larva’s inactive bouts lasted 111–120 seconds (X axis 120 sec) while 0.05% of the average psen2 F0 knockout larva lasted this long, so the fold change was 5×. Inactive bouts lasting < 1 sec were excluded from the analysis. In clutch 2, day 1 plot, two datapoints fall outside the Y axis limit: 140 sec, Y = 32×; 170 sec, Y = 16×. Data is from the baseline psen2 knockout trackings presented in Fig. 3 and Fig. 3–suppl. 2.

      Ultimately, this criticism seems challenging to definitely address experimentally. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus that is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Nevertheless, we believe the two analyses presented here are consistent with psen2 knockout larvae genuinely sleeping more during the day, so we decided to keep this label. We agree with the reviewer that the one-minute inactivity definition has limitations, especially for day-time inactivity.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just one drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly the question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug(s) than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–supplement 4). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts and that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we concede that the experiment does not necessarily say much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with the reviewer and Reviewer #2 that the conclusions were overly confident. As suggested, we decided to repeat this experiment with another SSRI, fluvoxamine. Please find the results of this experiment in Fig. 5–supplement 1. The suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Comments:

      - Data are presented in a variety of different ways, occasionally making comparisons across figures difficult. Perhaps at a minimum, behavioral fingerprints as in Figure 3 - Supplementary Figure 1 should be presented for all mutants in the main figures.

      We like this suggestion! Thank you. We brought the behavioural fingerprints figure (previously Fig. 4–supplement 5) as main Fig. 4, and put the figure focused on the sorl1 knockout behavioural phenotype in supplementary, with the other gene-by-gene figures.

      - It is not clear why some data were selected for supplemental rather than main figures. In many cases, detailed phenotypic data is provided for one example mutant in the main figures, and then additional mutants are described in detail in the supplement. Again, to facilitate comparisons between mutants, fingerprints could be provided for all mutants in a main figure, with detailed analyses moved to the supplements.

      The logic was to dedicate one main figure to psen2 (Fig. 3) as an example of an early-onset Alzheimer’s risk gene, and one to sorl1 (previously Fig. 4) as an example of a late-onset Alzheimer’s risk gene. We focused on them in main figures as they are both tested again later (Fig. 5 and Fig. 6). Having said that, we agree that the fingerprints may be a better use of main figure space than the parameters plots. In addition to the above (fingerprints of lateonset Alzheimer’s risk genes in main figure), we rearranged the figures in the early-onset AD section to have the psen2 F0 knockout fingerprint in main.

      - The explication of the utility of behavioral fingerprinting on page 35 is somewhat confusing. The authors describe drugs used to treat depression as enriched among small molecules anti-correlating with the sorl1 fingerprint. However, in Figure 5 - Supplementary Figure 1, drugs used to treat depression are biased toward positive cosines, which are indicated as having a more similar fingerprint to sorl1. These drugs should be described as more present among compounds positively correlating with the sorl1 fingerprint.

      Sorry, the confusion is about “(anti-)correlating”. Precisely, we meant “correlating and/or anti-correlating”, not just anti-correlating. We changed to that wording. In short, the analysis is by design agnostic to whether compounds with a given annotation are found more on the positive cosines side (le side in Fig. 5–supplement 1a) or the negative cosines side (right side). This is because the dataset often includes both agonists and antagonists to a given pathway but these are difficult to annotate. For example, say 10 compounds in the dataset target the dopamine D4 receptor, but these are an unknown mix of agonists and antagonists. In this case, we want ZOLTAR to generate a low p-value when all 10 compounds are found at extreme ends of the list, regardless of which end(s) that is (e.g. top 8 and bottom 2 should give an extremely low p-value). Initially, we were splitting the list, for each annotation, into positive-cosine fingerprints and negative-cosine fingerprints and testing enrichment on both separately, but we think the current approach is better as it reflects better the cases we want to detect and considers all available examples for a given annotation in one test. In sum, yes, in this case drugs used to treat depression were mostly in the positive-cosine side, but the other drugs on the negative-cosine side also contributed to what the p-value is, so it reflects better the analysis to say “correlating and/or anticorrelating”. You can read more about our logic for the analysis in Methods (section Behavioural pharmacology from sorl1 F0 knockout’s fingerprint).

      - The authors conclude the above-described section by stating: "sorl1 knockout larvae behaved similarly to larvae treated with small molecules targeting serotonin signaling, suggesting that the loss of Sorl1 disrupted serotonin signaling." Directionality here may be important. Are all of the drugs targeting the serotonin transporter SSRIs or similar? If so, then a correct statement would be that loss of Sorl1 causes similar phenotypes to drugs enhancing serotonin signaling. Finally, based on the correlation between serotonin transporter inhibitor trazodone and the sorl1 crispant phenotype, it is potentially surprising that the SSRI citalopram caused the opposite phenotype from sorl1, that is, increased sleep during the day and night. It is potentially interesting that this result was enhanced in mutants, and suggests dysfunction of serotonin signaling, but the statement that "our behavioral pharmacology approach correctly predicted from behaviour alone that serotonin signaling was disrupted" is too strong a conclusion.

      We understand “disrupt” as potentially going either way, but this may not be the common usage. We changed to “altered”.

      The point regarding directionality is excellent, however. We tested the proportion of serotonin transporter agonists and antagonists (SSRIs) on each side of the ranked list of small molecule fingerprints. We used the STITCH database for this analysis as it has more drug–target interactions, but likely less curated, than the Therapeutic Target Database (Szklarczyk et al., 2016). As with the Therapeutic Target Database, most fingerprints of compounds interacting with the serotonin transporter SLC6A4 were found on the side of positive cosines (p ~ 0.005 using the custom permutation test), which replicates Fig. 5a with a different source for the drug–target annotations (Author response image 5). On the side of positive cosines (small molecules which generate behavioural fingerprints correlating with the sorl1 fingerprint), there were 2 agonists and 26 antagonists. On the side of negative cosines (small molecules which generate behavioural fingerprints anti-correlating with the sorl1 fingerprint), there were 3 agonists and 2 antagonists. Using a Chi-squared test, this suggests a significant (p = 0.002) over-representation of antagonists (SSRIs) on the positive side (expected count = 24, vs. 26 observed) and agonists on the negative side (expected count = 1, vs. 3 observed). If SLC6A4 antagonists, i.e. SSRIs, indeed tend to cause a similar behavioural phenotype than knockout of sorl1, this would point in the direction of our original interpretation of the citalopram experiment; which was that excessive serotonin signalling is what causes the sorl1 behavioural phenotype.

      Author response image 5.

      Using the STITCH database as source of annotations also predicts SLC6A4 as an enriched target for the sorl1 behavioural fingerprint. Same figures as Fig. 5a,b but using the STITCH database (Szklarczyk et al., 2016) as source for the drug targets. a, Compounds annotated by STITCH as interacting with the serotonin transporter SLC6A4 tend to generate behavioural phenotypes similar to the sorl1 F0 knockout fingerprint. 40,522 compound–target protein pairs (vertical bars; 1,592 unique compounds) are ranked from the fingerprint with the most positive cosine to the fingerprint with the most negative cosine in comparison with the mean sorl1 F0 knockout fingerprint. Fingerprints of drugs that interact with SLC6A4 are coloured in yellow. Simulated p-value = 0.005 for enrichment of drugs interacting with SLC6A4 at the top (positive cosine) and/or bottom (negative cosine) of the ranked list by a custom permutation test. b, Result of the permutation test for top and/or bottom enrichment of drugs interacting with SLC6A4 in the ranked list. The absolute cosines of the fingerprints of drugs interacting with SLC6A4 (n = 52, one fingerprint per compound) were summed, giving sum of cosines = 15.9. To simulate a null distribution, 52 fingerprints were randomly drawn 100,000 times, generating a distribution of 100,000 random sum of cosines. Here, only 499 random draws gave a larger sum of cosines, so the simulated p-value was p = 499/100,000 = 0.005 **.

      If this were true, we would expect, as the reviewer suggested, SSRI treatment (citalopram or fluvoxamine) on control larvae to give a similar behavioural phenotype as knockout of sorl1. However, this generally did not appear to be the case (sorl1 knockout fingerprint vs. SSRI-treated control fingerprint, cosine = 0.08 ± 0.35; Author response image 6).

      Author response image 6.

      sorl1 F0 knockouts in comparison to controls treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the scrambled-injected + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the scrambled-injected + fluvoxamine (10 µM) fingerprint.

      The comparison with trazodone is an interesting observation, but it is only a weak serotonin reuptake inhibitor (Ki for SLC6A4 = 690 nM, vs. 8.9 nM for citalopram; Owens et al., 1997) and it has many other targets, both as agonist or antagonist, including serotonin, adrenergic, and histamine receptors (Mijur, 2011). In any case, the average trazodone fingerprint does not correlate particularly well to the sorl1 knockout fingerprint (cos = 0.3). Finally, the sorl1 knockout behavioural phenotype could be primarily caused by altered serotonin signalling in the hypothalamus, where we found both the biggest difference in tph1a/1b/2 HCR signal intensity (Fig. 5f) and the highest expression of sorl1 across scRNA-seq clusters (Fig. 1– supplement 2). In this case, it would be correct to expect sorl1 knockouts to react differently to SSRIs than controls, but it would be incorrect to expect SSRI treatment to cause the same behavioural phenotype, as it concurrently affects every other serotonergic neuron in the brain.

      Finally, we agree the quoted conclusion was too strong given the current evidence. We since tested another SSRI, fluvoxamine, on sorl1 knockouts.

      - Also in reference to Figure 5: in panel c, data are presented as deviation from vehicle treated. Because of this data presentation choice, it's no longer possible to determine whether, in this experiment, sorl1 crispants sleep less at night relative to their siblings. Does citalopram rescue / reverse sleep deficits in sorl1 mutants?

      On your first point, please see our response to Reviewer #3 (2)c and Author Response 2b above.

      On “does citalopram rescue/reverse sleep deficits in sorl1 mutants”: citalopram (and fluvoxamine) tends to reverse the key aspects of the sorl1 knockout behavioural phenotype by reducing night-time activity (% time active and total Δ pixels), increasing night-time sleep, and shortening sleep latency (Author response image 7). Extrapolating from the hypothesis presented in Discussion, this may be interpreted as a hint that sorl1 knockouts have reduced levels of 5-HT receptors, as increasing serotonin signalling using an SSRI tends to rescue the phenotype. However, we do not think that focusing on the significant behavioural parameters necessarily make sense here. Rather, one should take all parameters into account to conclude whether knockouts react differently to the drug than wild types (also see answer to Reviewer #3, (7) on this). For example, citalopram increased more the night-time sleep bout length of sorl1 knockouts than the one of controls (Fig. 5), but this parameter was not modified by knockout of sorl1 (Fig. 4). To explain the rationale more informally, citalopram is only used as a tool here to probe serotonin signalling in sorl1 knockouts, whether it worsens or rescues the behavioural phenotype is somewhat secondary, the key question is whether knockouts react differently than controls.

      Author response image 7.

      Comparing untreated sorl1 F0 knockouts vs. treated with SSRIs. a, sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the citalopram experiment) in comparison with the sorl1 knockout + citalopram (1 or 10 µM) fingerprints. Each dot represents the mean deviation from the same-clutch scrambled-injected H<sub>2</sub>O-treated mean for that parameter (z-score, mean ± SEM). b, As in a), sorl1 F0 knockout fingerprints (baseline recordings and sorl1 + H<sub>2</sub>O fingerprint from the fluvoxamine experiment) in comparison with the sorl1 + fluvoxamine (10 µM) fingerprint.

      - Possible molecular pathways targeted by tinidazole, fenoprofen, and betamethasone are not described.

      Tinidazole is an antibiotic, fenoprofen is a non-steroidal anti-inflammatory drug (NSAIDs), betamethasone is a steroidal anti-inflammatory drug. Interestingly, long-term use of NSAIDs reduces the risk of AD (in ’t Veld Bas A. et al., 2001). Several mechanisms are possible (Weggen et al., 2007), including reduction of Aβ42 production by interacting with γ-secretase (Eriksen et al., 2003). However, we did not explore the mechanism of action of these drugs on psen2 knockouts so do not feel comfortable speculating. We do not know, for example, whether these findings apply to betamethasone.

      Minor Comments:

      - On page 25, panel "g" should be labeled as "f".

      Thank you!

      - On page 35, a reference should be provided for the statement "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes.".

      Thank you, this is now corrected. There were the same studies as mentioned in Introduction.

      - On page 43, the word "and" should be added - "in wild-type rats and mice, overexpressing mutated human APP and PSEN1, AND restricting sleep for 21 days...".

      Right, this sentence could be misread, we edited it. “overexpressing […]” only applied to the mice, not the rats (as they are wild-type); and both are sleep-deprived.

      - On page 45, a reference should be provided for the statement "SSRIs can generally be used continuously with no adverse effects" and this statement should potentially be softened.

      The reference is at the end of that sentence (Cirrito et al., 2011). You are correct though; we reformulated this statement to: “SSRIs can generally be used safely for many years”. SSRIs indeed have side effects.

      - On page 54, a 60-minute rolling average is described as 45k rows, but this seems to be a 30-minute rolling average.

      Thank you! We corrected. It should have been 90k rows, as in: 25 frames-per-second × 60 seconds × 60 minutes.

      Reviewer #2 (Recommendations For The Authors):

      "As we observed in the scRNA-seq data, most genes tested (appa, appb, psen1, psen2, apoea, cd2ap, sorl1) were broadly expressed throughout the 6-dpf brain (Fig. 1d and Fig. 1supplement 3 and 4)."

      - apoea and appb are actually not expressed highly in the scRNA-seq data, and the apoea in situ looks odd, as if it has no expression. The appb gene mysteriously does not look as though it has high expression in the Raj data, but it is clearly expressed based on the in situ. I had previously noticed the same discrepancy, and I attribute it to the transcriptome used to map the Raj data, as the new DanioCell data uses a new transcriptome and indicates high appb expression in the brain. Please point out the discrepancy and possible explanation, perhaps in the figure legend.

      All excellent points, thank you. We included them directly in Results text.

      "most of these were expressed in the brain of 5-6-dpf zebrafish larvae, suggesting they play a role in early brain development or function."

      - Evidence of expression does not suggest function, particularly not a function in brain development. As one example, almost half of the genome is expressed prior to the maternal-zygotic transition but does not have a function in those earliest stages of development. There are numerous other instances where expression does not equal function. Please change the sentence even as simply as "it is possible that they".

      We mostly agree and edited to “[…], so they could play a role […]”.

      Out of curiosity, we plotted, for each zebrafish developmental stage, the proportion of Alzheimer’s risk gene orthologues expressed in comparison to the proportion of all genes expressed (Author response image 8). We defined “all genes” as every gene that is expressed in at least one of the developmental stages (n = 24,856), not the complete transcriptome, to avoid including genes that are never expressed in the brain or whose expression is always below detection limit. We counted a gene as “expressed” if at least three cells had detectable transcripts. Using these definitions, 82 ± 7% of genes are expressed during development. For every developmental stage except 5 dpf (so 11/12), a larger proportion of Alzheimer’s risk genes than all genes are expressed (+5 ± 4%).

      Author response image 8.

      Proportion of Alzheimer’s risk genes orthologues expressed throughout zebrafish development. Proportion of Alzheimer’s risk genes orthologues (n = 42) and all genes (n = 24,856) expressed in the zebrafish brain at each developmental stage, from 12 hours post-fertilisation (hpf) to 15 days post-fertilisation (dpf). “All genes” corresponds to every gene expressed in the brain at any of the developmental stages, not the complete transcriptome. A gene is considered “expressed” (green) if at least three cells had detectable transcripts. Single-cell RNA-seq dataset from Raj et al., 2020.

      "This frame-by-frame analysis has several advantages over previous methods that analysed activity data at the one-minute resolution."

      - Which methods are these? There are no citations. There are certainly existing methods in the zebrafish field that can produce similar data to the method developed for this project. This new package is useful, as most existing software is not written in R, so it would help scientists who prefer this programming language. However, I would be careful not to oversell its novelty, since many methods do exist that produce similar results.

      We added the references. There were referenced above after “we combined previous sleep/wake analysis methods”, but should have been referenced again here.

      We are not convinced by this criticism. We would obviously not claim that the FramebyFrame package is as sophisticated and versatile as video-tracking tools like SLEAP or DeepLabCut, but we do think it answers a genuine need that was not addressed by other methods. Specifically, we know of many labs recording pixel count data across multiple days using the Zebrabox or DanioVision (we added support for DanioVision data after submission), but there were no packages to extract behavioural parameters from these data. Other methods involved standalone scripts with no documentation or version tracking. We would concede the FramebyFrame package is mostly targeted at these labs, but we already know of six labs routinely using it and were recently contacted by a researcher tracking Daphnia in the Zebrabox.

      "F0 knockouts of both cutches" - "clutches"

      Thank you!

      Reviewer #3 (Recommendations For The Authors):

      I would suggest totally revamping the Introduction section, and being sure to provide readers with the context and background they need for the data that comes thereafter. Key areas to touch on, in no particular order, include:

      • Far more detail on the behavioral pharm screen upon which this paper builds, as a brief overview of that approach and the data generated are needed.

      Thank you for the suggestion, we added a sentence hinting at this work in the last Introduction paragraph.

      • Limitations of current zebrafish sleep/arousal assays that motivated the authors to develop a new, temporally high-resolution system.

      We think this is better explained in Results, as is currently. For example, we need to point to Fig. 2–supplement 2a,b,c to explain that one-minute methods were missing sleep bouts and how FramebyFrame resolves this issue.

      • A paragraph about sleep and AD, that does a better job of citing work in humans, mammalian, and invertebrate models that motivate the interest in the connection pursued here.

      Sorry, we think this would place too much focus on sleep and AD. We want the main topic of the paper to be the behavioural pharmacology approach, not AD or sleep per se. As the Introduction states, we see Alzheimer’s risk genes as a case study for the behavioural pharmacology approach, rather than the reason why the approach was developed. Additionally, presenting sleep and AD in Introduction risks sounding like ZOLTAR is specifically designed for this context, while we conceived of it as much more generalisable and explicitly encourage its use to study genes associated to other diseases. Note that the paragraph you suggest is, we think, mostly present in Discussion (section Disrupted sleep and serotonin signalling […]).

      • I modestly suggest eliminating making such a strong case for a gene-first approach being the best way to understand disease. It is not a zero-sum game, and there is plenty to learn from proteomics, metabolomics, etc. I suspect nobody will argue with the authors saying they leveraged the strength of their system and focused on key AD genes of interest.

      From your point below, we understand the following quote is the source of the issue: “For finding causal processes, studying the genome, rather than the transcriptome or epigenome, is advantageous because the chronology from genomic variant to disease is unambiguous […]”. We did not want to suggest it is a zero-sum game, but we now understand how it can be read this way. We adapted slightly the wording. What we want to do is highlight the causality argument as the advantage of the genomics approach. We feel we do not read this argument often enough, while it remains a ‘magic power’ of genomics. One essentially does not have to worry about causality when studying a pathogenic germline variant, while it is a constant concern when studying the transcriptome or epigenome (i.e. did the change in this transcript’s level cause disease, or vice-versa?). To take an example in the context of AD, arguments based on genomics (e.g. Down syndrome or APP duplication) are often the definite arbiters when debating the amyloid hypothesis, exactly because their causality cannot be doubted.

      Minor comments

      (1) The opening of the introduction is perhaps overly broad, spending an entire paragraph on genome vs transcriptome, etc and making the claim that a gene-first approach is the best path. It isn't zero-sum, and the authors could just get right into AD and study genes of interest. Similar issues occur throughout the manuscript, with sentences/paragraphs that are not necessarily needed.

      Please see our answer to your previous point. On the introduction being overly broad, we perfectly agree it is broad, but related to your point about presenting sleep and AD in the Introduction, we wish to talk about finding causal processes from genomics findings using behavioural pharmacology. We purposefully present research on AD as one instance of this broader goal, not the primary topic of the paper.

      Another example are these sentences, which could be totally removed as the following paragraph starts off making the same point much more succinctly. "From genomic studies of AD, we know that mutations in genes such as SORL1 modify risk by disrupting some biological processes. Presumably, the same processes are disrupted in zebrafish sorl1 knockouts, and some caused the behavioural alterations we observed. Can we now follow the thread backwards and predict some of the biological processes in which Sorl1 is involved based on the behavioural profile of sorl1 knockouts?"

      Thanks for the suggestion, but we think these sentences are useful to place back this Results section in the context of the Introduction. Think of the paper as mainly about the behavioural pharmacology approach, not on Alzheimer’s risk genes. The function of the paragraph here is not simply to explain the method by which we decided to study sorl1; it is to reiterate the rationale behind the behavioural pharmacology approach so that the reader understands where this Results section fits in the overall structure.

      (2) Related to the above, the authors use lecanemab as an example to support their approach, but there has been a great deal of controversy regarding this drug. I don't think such extensive justification is needed. This study uses AD risk genes as a case study in a newly developed behavioral pharm pipeline. A great deal of the rest of the intro seems to just fill space and could be more focused on the study at hand. Interestingly, a er gene selection, the next step in their pipeline is sleep/wake analysis yet nothing is covered about AD and sleep in the intro. Some justification of that approach (why focus on sleep/wake as a starting point for behavioral pharm rather than learning and memory?) would be a better use of intro space.

      There has indeed been controversy about lecanemab, but even the harshest critiques of the amyloid hypothesis concede that it slows down cognitive decline (Espay et al., 2023). That is all that is needed to support our argument, which is that research on AD started primarily from genomics and thereby yielded a disease-modifying drug. The controversy seems mostly focused on whether this effect size is clinically significant, and we think we correctly represent this uncertainty (e.g. “antibodies against Aβ such as lecanemab show promise in slowing down disease progression” and “the beneficial effects from targeting Aβ aggregation currently remain modest”).

      Your next point is entirely fair. We mostly answered it above. To explain further, the primary reason why we measured sleep/wake behaviour is to match the behavioural dataset from Rihel et al., 2010 so we can use it to make predictions, not to study sleep in the context of AD per se. Sure, perhaps learning and memory would have been interesting, but we do not know of any study testing thousands of small molecules on zebrafish larvae during a memory task. We understand it can be slightly confusing though, as we then spend a paragraph of Discussion on sleep as a causal process in AD, but we obviously need to discuss this topic given the findings. However, to reiterate, we purposefully designed FramebyFrame and ZOLTAR to be useful beyond studying sleep/wake behaviour. For example, FramebyFrame would not calculate 17 behavioural parameters if the only goal was to measure sleep. We now mention the Rihel et al., 2010 study in the Introduction as you suggested above (“Far more detail on the behavioral pharm screen […]”), as that is the real reason why sleep/wake behaviour was measured in the first place.

      (3) Also related to the above, another more relevant point that could be talked about in the intro is the need for more refined approaches to analyze sleep in zebrafish, given the effort that went into the new analysis system described here. Again, I think the context for why the authors developed this system would be more meaningful than the current content.

      Thank you, we think we answered this point above (especially below Limitations of current zebrafish sleep/arousal assays […]).

      (4) GWAS can stand for Genome-wide associate studies (plural) so I do not think the extra "s" is needed (GWASs) .

      Indeed, that seems to be the common usage. Thank you.

      (5) AD candidate risk genes were determined from loci using "mainly statistic colocalization". Can the authors add a few more details about what was done and what the "mainly" caveat refers to?

      “Mainly” simply refers to the fact that other methods were used by Schwartzentruber et al. (2021) to annotate the GWAS loci with likely causal genes, but that most calls were ultimately made from statistic colocalisation. Readers can refer to this work to learn more about the methods used.

      (6) The authors write "The loss of psen1 only had mild effects on behaviour" but I think they mean "sleep behaviors" as there could be many other behaviors that are disrupted but were not assessed. The same issue a few sentences later with "Behaviour during the day was not affected" and at the end of the following paragraph.

      Yes, that would be more precise, thank you.

      (7) For the Sorl1 pharmacology data, it is very hard to understand what is being measured behaviorally. Are the authors measuring sleep +/- citalopram, or something else, and why the change to Euclidean distance rather than all the measures we were just introduced to earlier in the manuscript?

      We understand these plots (Fig. 5c,d) are less intuitive, but it is important that we show the difference in behaviour compared to H<sub>2</sub>O-treated larvae of same genotype. The claim is that citalopram has a larger effect on knockouts than on controls, so the reader needs to focus on the effect of the drug on each genotype, not on the effect of sorl1 knockout. We added the standard fingerprints (i.e. setting controls to z-score = 0) here in Author response figures.

      Euclidean distance takes as input all the measures we introduced. The point is precisely not to select a single measure. For example, say we were only plotting active bout number during the day, we would conclude that 10 µM citalopram has the same effect on knockouts and controls. Conversely, if we had taken sleep bout length at night, we would conclude 10 µM has a stronger effect on knockouts. What is the correct parameter to select? Using Euclidean distance resolves this by taking all parameters into account, rather than arbitrarily choosing one.

      And what exactly is a "given spike in serotonin"? and how is this hypothesis the conclusion based on the lack of evidence for the second hypothesis? As the authors say, there could be other ways sorl1 knockouts are more sensitive to citalopram, so the absence of evidence for one hypothesis certainly does not support the other hypothesis.

      We mean a given release of serotonin in the synaptic cleft. We have fixed this wording. 

      We tend to disagree on the second point. We can think of two ways that sorl1 knockouts are more sensitive to citalopram: 1) they produce more serotonin, so blocking reuptake causes a larger spike in knockouts; or 2) blocking reuptake causes the same increase in both knockouts and wild-types but knockouts react more strongly to serotonin. We cannot in fact think of another way to explain the citalopram results. Not finding overwhelming evidence for 1) surely supports 2) somewhat, even if we do not have direct evidence for it. As an analogy, if two diagnoses are possible for a patient, testing negative for the first one supports the other one, even before it is directly tested.

      (8) Again some language is used without enough care. Fish are referred to as "drowsier" under some drug conditions. How do the authors know the animal is drowsy? The phenotype is more specific - more sleep, less activity.

      Thank you, we switched to “Furthermore, fenoprofen worsened the day-time hypoactivity of psen2 knockout larvae […]”.

      (9) This sentence is misleading as it gives the impression that results in this manuscript suggest the conclusion: "Our observation that disruption of genes associated with AD diagnosis after 65 years reduces sleep in 7-day zebrafish larvae suggest that disrupted sleep may be a common mechanism through which these genes exert an effect on risk." That idea is widely held in the field, and numerous other previous manuscripts/reviews should be cited for clarity of where this hypothesis came from.

      This idea is not widely held in the field. You likely read this point as “disrupted sleep is a risk factor for AD”, which, yes, is widely discussed in the field, but is not precisely what we are saying. We hypothesise that mutations in some of the Alzheimer’s risk genes cause disrupted sleep, possibly from a very early age, which then causes AD decades later. Studies and reviews on sleep and AD rarely make this hypothesis, at least not explicitly. The closest we know of are a few recent human genetics studies, typically using Mendelian Randomisation, finding that higher genetic risk of AD correlates with some sleep phenotypes, such as sleep duration (Chen et al., 2022; Leng et al., 2021). The work of Muto et al. (2021) is particularly interesting as it found correlations between higher genetic risk of AD and some sleep phenotypes in men in their early twenties, which seems unlikely to be a consequence of early pathology (Muto et al., 2021). Note, however, that even these studies do not mention sleep possibly being disrupted early in development, which is what our findings in zebrafish larvae support. As we mention, we think a team should test whether sleep is different in infants at higher genetic risk of AD, essentially performing an analogous, but obviously much more difficult, experiment as we did in zebrafish larvae. We do not know of any study testing this or even raising this idea, so evidently it is not widely held. Having said that, the studies we mention here were not referenced in the Discussion paragraph. We have now corrected this.

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and LateOnset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, RoonJva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the AuJsm Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A, Ruitenberg A, Hofman A, Launer LJ, van Duijn CM, Stijnen T, Breteler MMB, Stricker BHC. 2001. Nonsteroidal Anti inflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbek BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Mikur A. 2011. Trazodone: properties and utility in multiple disorders. Expert Rev Clin Pharmacol 4:181–196. doi:10.1586/ecp.10.138

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Owens MJ, Morgan WN, Plok SJ, Nemeroff CB. 1997. Neurotransmiker receptor and transporter binding profile of antidepressants and their metabolites. J Pharmacol Exp Ther 283:1305– 1322.

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a reexamination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476– E485. doi:10.1073/pnas.1618657114

      Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. 2016. STITCH 5: augmenting protein–chemical interaction networks with tissue and affinity data. Nucleic Acids Res 44:D380–D384. doi:10.1093/nar/gkv1277

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.Jps.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Daka SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Author response:

      The following is the authors’ response to the original reviews

      We would like to thank you and the reviewers for valuable feedback on the first version of the manuscript. We now addressed all of the issues raised by reviewers, mostly by implementing the suggested changes and clarifying important details in the revised version of the manuscript. A detailed response to each comment is provided in the rebuttal letter. Briefly, the main changes were as follow:

      - We changed homeostatic balance to network balance especially when describing the main finding as the response changes induced by the stimulation occurred on a fast timescale. We speculate the sustained changes observed in the post-stimulation condition are the result of homeostatic mechanisms.

      - We added additional verification on the target stimulation effect by adding a supplementary result showing its effect between the target and off-target z-planes, as well as demonstrating the minimal impact of the imaging laser to rsChRmine.

      - We added a simple toy model illustrating suppression specifically applied to co-tuned cells that yields the response amplitude decrease, to further support our findings.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population.

      Weaknesses:

      I have several main concerns about the interpretation of these data:<br /> (1) The premise of the paper suggests that sensory responses are noisy at the level of neurons, but that population activity is reliable and that different neurons may participate in sensory coding on different trials. However, no analysis related to single trial variance or overall stability of population coding is provided. Specifically, showing that population activity is stable across trials in terms of total activity level or in some latent low dimensional representation would be required to support the concept of "homeostatic balancing".

      Thank you for raising an important point. We agree that the term ‘homeostatic balancing’ may be not the best term to be applied to explain the main results. We now have toned down on the homeostatic plasticity aspect to explain the main result. We have changed the term to a simple ‘network balance’, potentially due to various factors including rapid synaptic plasticity. We speculate the persistent activity of co-tuned cells in the post-stimulation session as a result of homeostatic balance, instead of rapidly changing back their responses to the baseline. Relevant changes are implemented throughout the manuscript including Introduction (e.g., lines 76-78) and Discussion sections (e.g., lines 453-456).

      (2) Rebalancing would predict either that the responses of stimulated neurons would remain A) elevated after stimulation due to a hebbian mechanism or B) suppressed due to high activity levels on previous trials, a homeostatic mechanism. The authors report suppression in targeted neurons after stimulation blocks, but this appears similar to all other non-stimulated neurons. How do the authors interpret the post-stimulation effect in stimulated neurons?

      It is true that the post stimulation effect of no response change both from co-tuned and non co-tuned neurons, and both from stimulation and control sessions. This could be due to neuronal activity being adapted and decreased enough from the consecutive presentation of acoustic stimuli themselves. However, we still think that if the stimulation driven co-tuned non stimulated neurons’ response decrease is highly driven by stimulation without homeostasis, at least their responses should bounce back during the post-stimulation. We agree that further investigation would be required to further confirm such effect. We elaborated this as another discussion point in the discussion section (lines 457-464).

      (3) The authors suggest that ACtx is different from visual cortex in that neurons with different tuning properties are intermingled. While that is true at the level of individual neurons, there is global order, as demonstrated by the authors own widefield imaging data and others at the single cell level (e.g. Tischbirek et al. 2019). Generally, distance is dismissed as a variable in the paper, but this is not convincing. Work across multiple sensory systems, including the authors own work, has demonstrated that cortical neuron connectivity is not random but varies as a function of distance (e.g. Watkins et al. 2014). Better justification is needed for the spatial pattern of neurons that were chosen for stimulation. Further, analyses that account for center of mass of stimulation, rather than just the distance from any stimulated neuron would be important to any negative result related to distance.

      Thank you for the further suggestion regarding the distance matter. While Watkins et al., 2014 and Levy and Reyes (2012) showed stronger connectivity for nearby cells as well as for more distant patches, on a functional level, Winkowski & Kanold 2013 showed high frequency heterogeneity especially in L2/3, where we targeted to image in this study. Thus, connected cells can have varied tuning consistent with spine imaging (Konnerth paper). We now also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification and still observed no distance related stimulation effect. We now replaced the Figure 4B with the result from the center of mass calculation.

      (4) Data curation and presentation: Broadly, the way the data were curated and plotted makes it difficult to determine how well-supported the authors claims are. In terms of curation, the removal of outliers 3 standard deviations above the mean in the analysis of stimulation effects is questionable. Given the single-cell stimulation data presented in Figure 1, the reader is led to believe that holographic stimulation is quite specific. However, the justification for removing these outliers is that there may be direct stimulation 20-30 um from the target. Without plotting and considering the outliers as well, it is difficult to understand if these outsized responses are due to strong synaptic connections with neighboring neurons or rather just direct off-target stimulation. Relatedly, data presentation is limited to the mean + SEM for almost all main effects and pre-post stimulation effects are only compared indirectly. Whether stimulation effects are driven by just a few neurons that are particularly suppressed or distinct populations which are suppressed or enhanced remains unclear.

      Thank you for pointing this out. Now we specifically removed neighboring cells that are < 20 um from the target point and we observed similar. We replaced all the relevant figures, texts, and statistical results to ensure that the exclusion was specific to overlapping neighboring cells.

      Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      he conclusions of this paper are very interesting but some aspects of the study including methods for optogenetic stimulation, statistical analysis of the results and interpretation of the underlying mechanisms need to be clarified and extended.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such an approach is complex and requires precise controls to be convincing. In the manuscript, several methodological aspects are not sufficiently described to allow a proper understanding.

      (i) The use of CRmine together with GCaMP8s has been reported as problematic as the 2Ph excitation of GCaMP8s also excites the opsin. Here, the authors use a red-shifted version of CRmine to prevent such cross excitation by the imaging laser. To be convincing, they should explain how they controlled for the absence of rsCRmine activation by the 940nm light. Showing the fluorescence traces immediately after the onset of the imaging session would ensure that neurons are not excited as they are imaged.

      Thank you for pointing this out. We realized that the important reference was omitted. Kishi et al. 2022 validated the efficacy of the rsChRmine compared to ChRmine. In this paper, they compared regular ChRmine and rsChRmine activity to different wavelengths and setting and showed the efficiency of rsChRmine with reduced optical cross talk. This reference is now included in the manuscript (line 98). We also checked the spontaneous baseline activity that lasted about 10 sec. before any of the sound presentation and observed a relatively stable activity throughout, rather than any imaging session onset related activation, which is also similar to what we see from another group of GCaMP6s transgenic animals.

      Author response image 1.

      Baseline fluorescence activity across cells within FOVs from AAV9-hSyn-GCaMP8s-T2A-rsChRmine injected mice (top) and CBA X Thy1-GCaMP6s F1 transgenic mice (bottom). Fluorescence levels and activity patterns remain similar, suggesting no evident imaging laser-induced activation from rsChRmine. Note that GCaMP8s examples are smoothed by using moving average of 4 points as GCaMP8s show faster activity.

      (ii) Holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such an effect, one could in principle decouple the imaging and the excitation planes, and check for the absence of out-of-focus unwanted excitation.

      We further verified whether the laser power at the targeted z-plane influences cells’ activity at nearby z-planes. As the Reviewer pointed out, the previous x- and y-axis shifts were tested by single-cell stimulation. This time, we stimulated five cells simultaneously, to match the actual experiment setup and assess potential artifacts in other planes. We observed no stimulation-driven activity increase in cells at a z-planed shifted by 20 µm (Supplementary Figure 1). This confirms the holographic stimulation accurately manipulates the pre-selected target cells and the effects we observe is not likely due to out-of-focus stimulation artifacts. It is true that not all pre-selected cells showing significant response changes prior to the main experiment are effectively activated t every trial during the experiments. We varied the target cell distances across FOVs, from nearby cells to those farther apart within the FOV. We have not observed a significant relationship between the target cell distances and stimulation effect. Lastly, cells within < 20 µm of the target were excluded to prevent potential excitation due to the holographic stimulation power. Given the spontaneous movements of the FOV during imaging sessions due to animal’s movement, despite our efforts to minimize them, we believe that any excitation from these neighboring neurons would be directly from the stimulation rather than the light pattern artifact itself.

      (iii) The control shown in Figure 1B is intended to demonstrate the precision of the optogenetic stimulation: when the stimulation spiral is played at a distance larger or equal to 20 µm from a cell, it does not activate it. However, in the rest of the study, the stimulation is applied with a holographic approach, targeting 5 cells simultaneously instead of just one. As the holographic pattern of light could produce out-of-focus hot spots (absent in the single cell control), we don't know what is the extent of the contamination from non-targeted cells in this case. This is important because it would determine an objective criterion to exclude non-targeted but excited cells (last paragraph of the Result section: "For the stimulation condition, we excluded non-target cells that were within 15 µm distance of the target cells...")

      Highly sensitive neurons to certain frequency also shows the greatest adaptation effect, which can be observed the control condition. Therefore, the high sensitive neurons showing greater amplitude change is first related to the neuronal adaptation to its sensitive information. However, by stimulating the co-tuned target neurons, other co-tuned non-target neurons shows significantly greater amplitude decrease, compared to either non co-tuned target neurons stimulation or control (the latter did not meet the significance level).

      We also tried putting more rigorous criterion as 20 um instead of 15 um as you pointed out since the spiral size was 20 um. The result yielded further significant response amplitude decrease due to the stimulation effect only from co-tuned non-target neurons for processing their preferred frequency information.

      (2) A strength of this study comes from the design of the experimental protocol used to compare the activity in non-target co-tuned cells when the optogenetic stimulation is paired with their preferred tone versus a non-preferred pure tone. The difficulty lies in the co-occurrence of the rebalancing process and the adaptation to repeated auditory stimuli, especially when these auditory stimuli correspond to a cell's preferred pure tones. To distinguish between the two effects, the authors use a comparison with a control condition similar to the optogenetic stimulation conditions, except that the laser power is kept at 0 mW. The observed effect is shown as an extra reduction of activity in the condition with the optogenetic paired with the preferred tone, compared to the control condition. The specificity of this extra reduction when stimulation is synchronized with the preferred tone, but not with a non-preferred tone, is a potentially powerful result, as it points to an underlying mechanism that links the assemblies of cells that share the same preferred pure tones.

      The evidence for this specificity is shown in Figure 3A and 3D. However, the universality of this specificity is challenged by the fact that it is observed for 16kHz preferring cells, but not so clearly for 54kHz preferring cells: these 54kHz preferring cells also significantly (p = 0.044) reduce their response to 54kHz in the optogenetic stimulation condition applied to 16kHz preferring target cells compared to the control condition. The proposed explanation for this is the presence of many cells with a broad frequency tuning, meaning that these cells could have been categorized as 54kHz preferring cells, while they also responded significantly to a 16kHz pure tone. To account for this, the authors divide each category of pure tone cells into three subgroups with low, medium and high frequency preferences. Following the previous reasoning, one would expect at least the "high" subgroups to show a strong and significant specificity for an additional reduction only if the optogenetic stimulation is targeted to a group of cells with the same preferred frequency. Figure 3D fails to show this. The extra reduction for the "high" subgroups is significant only when the condition of opto-stimulation synchronized with the preferred frequency is compared to the control condition, but not when it is compared to the condition of opto-stimulation synchronized with the non-preferred frequency.

      Therefore, the claim that "these results indicate that the effect of holographic optogenetic stimulation depends not on the specific tuning of cells, but on the co-tuning between stimulated and non-stimulated neurons" (end of paragraph "Optogenetic holographic stimulation decreases activity in non-target co-tuned ensembles") seems somewhat exaggerated. Perhaps increasing the number of sessions in the 54kHz target cell optogenetic stimulation condition (12 FOV) to the number of sessions in the 16kHz target cell optogenetic stimulation condition (18 FOV) could help to reach significance levels consistent with this claim.

      We previously also tested by randomly subselecting 12 FOVs from 16kHz stimulation condition to match the same number of FOV between two groups and did not really see any result difference. However, to further ensure the results, we now added three more dataset for 54 kHz target cell stimulation condition (now 15 FOV) which yielded similar outcome. We have now updated the statistical values from added datasets.

      (3) To interpret the results of this study, the authors suggest that mechanisms based on homeostatic signaling could be important to allow the rebalancing of the activity of assemblies of co-tuned neurons. In particular, the authors try to rule out the possibility that inhibition plays a central role. Both mechanisms could produce effects on short timescales, making them potential candidates. The authors quantify the spatial distribution of the balanced non-targeted cells and show that they are not localized in the vicinity of the targeted cells. They conclude that local inhibition is unlikely to be responsible for the observed effect. This argument raises some questions. The method used to quantify spatial distribution calculates the minimum distance of a non-target cell to any target cell. If local inhibition is activated by the closest target cell, one would expect the decrease in activity to be stronger for non-target cells with a small minimum distance and to fade away for larger minimum distances. This is not what the authors observe (Figure 4B), so they reject inhibition as a plausible explanation. However, their quantification doesn't exclude the possibility that non-target cells in the minimum distance range could also be close and connected to the other 4 target cells, thus masking any inhibitory effect mediated by the closest target cell. In addition, the authors should provide a quantitative estimate of the range of local inhibition in layers 2/3 of the mouse auditory cortex to compare with the range of distances examined in this study (< 300 µm). Finally, the possibility that some target cells could be inhibitory cells themselves is considered unlikely by the authors, given the proportions of excitatory and inhibitory neurons in the upper cortical layers. On the other hand, it should be acknowledged that inhibitory cells are more electrically compact, making them easier to be activated optogenetically with low laser power.

      Minimum distance is defined as the smallest distance non-target cell to any of the target cells. Thus, if this is local inhibition, it is likely that the closest target cell would have affected the non-target cells’ response changes. We also calculated the distance based on the center of mass of target cells to calculate the distance effect for an additional verification, based on both Reviewers’ comments, and still observed no distance related stimulation effect. The result is now updated in Figure 4B.

      Based on previous literature, such as Levy & Reyes 2012, the excitatory and inhibitory connectivity is known to range around 100 um distance. Our results do not necessarily show any further effect observed for cells with distance below 100 um. This suggests that such effect is not limited to local inhibition. We also added further speculation on why our results are less likely due to increased inhibition, albeit the biological characteristics of inhibitory neurons to optogenetics.

      Reviewer #3 (Public review):

      Summary:

      The authors optogenetically stimulate 5 neurons all preferring the same pure tone frequency (16 or 54 kHz) in the mouse auditory cortex using a holography-based single cell resolution optogenetics during sound presentation. They demonstrate that the response boosting of target neurons leads to a broad suppression of surrounding neurons, which is significantly more pronounced in neurons that have the same pure tone tuning as the target neurons. This effect is immediate and spans several hundred micrometers. This suggests that the auditory cortical network balances its activity in response to excess spikes, a phenomenon already seen in visual cortex.

      Strengths:

      The study is based on a technologically very solid approach based on single-cell resolution two-photon optogenetics. The authors demonstrate the potency and resolution of this approach. The inhibitory effects observed upon targeted stimulation are clear and the relative specificity to co-tuned neurons is statistically clear although the effect size is moderate.

      Weaknesses:

      The evaluation of the results is brief and some aspects of the observed homeostatic are not quantified. For example, it is unclear whether stimulation produces a net increase or decrease of population activity, or if the homeostatic phenomenon fully balances activity. A comparison of population activity for all imaged neurons with and without stimulation would be instructive. The selectivity for co-tuned neurons is significant but weak. Although it is difficult to evaluate this issue, this result may be trivial, as co-tuned neurons fire more strongly. Therefore, the net activity decrease is expected to be larger, in particular, for the number of non-co-tuned neurons which actually do not fire to the target sound. The net effect for the latter neurons will be zero just because they do not respond. The authors do not make a very strong case for a specific inhibition model in comparison to a broad and non-specific inhibitory effect. Complementary modeling work would be needed to fully establish this point.

      Thank you for raising important points. We agree that the term homeostatic balancing may have been an overstatement. We toned down regarding the homeostatic plasticity and conclude the result from the rapid plasticity at a single trial level now. Regardless, the average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim), which seems to suggest that overall activity level has been maintained regardless of the stimulation. We added a new figure of the global activity change as Fig. 4A.

      We also added a simple model work in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) For the first holography paper in A1, more information is needed about how holographic stimulation was performed and how stimulation artifacts were avoided or removed from the data set, especially as the text states that the PMTs were left open for the duration of the experiment.

      We further clarified the rationale of leaving the shutter open to avoid any mechanic sounds to activate neurons in the AC. We further clarified that we keep the uncaging shutter open since the Bruker default setting (Software version: 5.7) opens and closes the shutter for the every iteration of the stimulation which generates extra heavy mechanical sounds which then hinders whether the activation is due to the sound or stimulation.

      (2) The choice of the dF/F as the primary tool for quantifying data should be better justified. Presumably, cells have very different variances in baseline activity levels and baseline fluorescence levels that create a highly skewed distribution of responses across the population. Further, a

      To take the baseline activity variances into account, we first calculate dF/F normalising to the baseline period (about 330 ms before the sound onset) right before each trial, per cell level. By doing so, we minimize any effect that could have been driven by variable baseline activity levels across neurons.

      (3) More analysis should be performed to determine why 33% of stimulated cells are not activated, and instead are suppressed during stimulation. Is this related to a cells baseline fluorescence?

      Great point. Although we tried our best to pre-select stimulation-responsive neurons before we start the actual experiments and head fix the animals as much as possible, these neurons do not stay as the “best stimulation-responsive neurons” throughout the entire imaging session. There can be various caveats on this. First, they seem to change their activity levels due to the optogenetic stimulation after they are exposed to acoustic stimulation. Second, since the AC is in the temporal side, it is likely to be more affected from the animals’ and their brain movements throughout the imaging session, which could be bigger than visual cortex or motor cortex. However, 33% of 5 cells is about 1.5 cells so it is usually missed about one cell on average, although some sessions have all 5 cells being stimulated while some other sessions have clearly less effective holographic stimulation effect.

      We even manually visualised the fluorescence change due to the holographic stimulation before we start any imaging sessions. Regardless, they don’t stay as the ‘best stimulation responsive cells’ throughout which we cannot control the natural biological aspect of neuronal activities. Regardless, based on the significant stimulation effects observed by presenting different pure tone frequencies as well as delivering different target stimulation and no-stimulation control, we believe that the effect itself is valid. We added these caveats into the manuscript as a further discussion point and things to consider.

      (4) The linear mixed-effects model should include time as a variable as A) the authors hypothesize that responses should be reduced over time due to sensory adaptation and that B) stimulation induced suppression might be dynamic (though they find it is not).

      Since the stimulation effect seems to be independent from trial-by-trial changes among stimulation conditions (Fig. 4) and we now have toned down on the aspect of homeostasis, we kept the current mixed-effect model variables.

      (5) More speculation is needed on why stimulation suppresses responses from the first trial onwards.

      We further speculate such rapid response changes due to activity-dependent synaptic changes due to overall network energy shift from optogenetic stimulation to maintain the cortical circuit balance.  

      (6) What does each dot represent in Figure 4a vs. Figure 4B? They are very different in number.

      In 4A, each dot is average amplitude change values per each trial level. They are exactly same number of dots between frequency, cell groups and conditions as each dot represents each trial (20 each). The reason why it may look differ could be only due to some overlaps between frequencies.

      In 4B, each dot is each cell. The reason why it’s denser in Stimulation conditions’ 16kHz preferring cells panel is that it naturally had more FOVs thus more cells to be plotted. We further clarified these details in the figure legend.

      (7) How sensory responsive neurons were selected should be shown in the figures. Specifically, which fraction of the 30% of most responsive neurons were stimulated should be stated. Depending on the exact yield in the field of view, all or only a minority of strongly sensory responsive neurons are being stimulated, which in either case would color the interpretation of the data.

      We tried varying the FOV as much as possible across sessions to ensure that FOVs are directly in the A1 covering a range of frequencies. If we cannot observe more than 80 neurons as sound responsive neurons from processed suite2p data, we searched for another FOV.  

      We now included an example FOV of the widefield imaging we first conducted to identify A1, and another example FOV of the 2-photon imaging where we conducted a short sound presentation session to identify the sensory responsive neurons, as an inset of the ‘Cell selection’ part in Figure 1.

      Reviewer #2 (Recommendations for the authors):

      Minor points:

      - p.4, last line: "of" probably missing "the processing the target..."

      Fixed.

      - p.5, top, end of the first paragraph of this page: Figure 3B and 3E don't show exemplar traces.

      Corrected as Figure 2A and 2D.

      - P.5, first sentence of the paragraph "Optogenetic holographic stimulation increases activity in targeted ensembles": reference to Figure 3A and 3D should rather be Figure 2A and 2D.

      Corrected.

      - P.9, 2nd paragraph: sentence with a strange syntax: "since their response amplitude..."

      Corrected.

      - Figure 2: panels C and F are missing.

      Corrected.

      - p.11, methods: "wasthen" should be "was then".

      Corrected.

      - p.12, analysis: it is not clearly explained why the sound evoked activity is computed based on the 160ms to 660ms after sound onset instead of 0ms to 660 ms. It is likely related to some potential contamination but it should be explicitly explained.

      Due to the relatively slow calcium transient to more correctly capture the sound related evoked responses. Added this detail.

      - Methods, analysis: the authors should better explain how they conducted the random permutation described in the Figures 1D, 2B and 2E. Which signals were permutated?

      Random permutation to shuffle the target cell ID.

      - References 55 and 56 don't explicitly state that excitatory neurons generally have stronger responses to sound than inhibitory neurons.

      Thank you for pointing out this error. We replaced those references with Maor et al. 2016 and Kerlin et al. 2010, showing excitatory neurons show more selective tuning, and also changed the wording more appropriately.

      - It is not explained whether the imaging sessions are performed on awake or anaesthetized animals. It is probably done on awake animals, but then it is not clear what procedure is used to get the animals used to the head restraint. It usually takes a few days for the mice to get used to it, and the stress level is often different at the beginning and end of an experiment. Given the experimental protocol used in the study, in which sessions are performed sequentially and compared to each other, this aspect could play a role. However, the main comparison made is probably safe as it compares a control condition (laser at 0mW) and conditions with optogenetic stimulation, all done with similar sequences of sessions.

      The experiment was conducted on awake animals. Although we did not have any control on comparing their status in the beginning and the end of the experiment, they all had a widefield imaging session imaging session to identify the A1 region which uses the same head-fixation setup, thus they are more used to the setup when we conduct 2-photon imaging and stimulation. Regardless of the session, if animals show any sign of extra discomfort due to the unfamiliar setup, we keep them there for 10-15 minutes until they are accustomed to the setup with no movement. If they still show a sign of discomfort, we take them out and try for another day. We now included this detail on the manuscript.

      Reviewer #3 (Recommendations for the authors):

      - Evaluate the global effect of stimulation on the population activity averaged across all neurons (activated and non-activated).

      Thank you for your suggestions. We now included a new Figure 3A that present the population activity across all responsive cells. The average activity level did not differ among stimulation conditions (control, 16kHz stim, and 54kHz stim).

      - Evaluate with a simple model if a population of neurons with different sound tuning receiving non-specific inhibition would not produce the observed effect.

      Thank you for the suggestion. We generated a simple model in which a suppression term was applied either to all neurons or specifically to non-target co-tuned cells to test our results from the data. We took a similar range of number of neurons and FOVs to closely simulate the model to the real dataset structure. On 50 simulated calcium traces of neurons (n),

      Trace<sub>n(t)</sub> = R<sub>n(t)</sub> – theta<sub>n</sub> + epsilon<sub>n(t)</sub>

      Where R<sub>n(t)</sub> is a response amplitude from either baseline or stimulation session, theta<sub>n</sub> is a suppression term applied either to all neurons or only to non-target co-tuned neurons, only during the stimulation session, and epsilon<sub>n(t)</sub> is additive noise. Theta was defined based on the average amount of increased activity amplitudes generated from target neurons due to the stimulation, implemented from the real dataset with extra neuron-level jitter. Similar to the real data analyses, we compared the response change between the stimulation and baseline sessions’ trace amplitudes. By comparing two different model outcomes and the real data, we observed a significant effect of the model type (F(2, 2535) = 34.943, p < 0.0001) and interaction between the model type and cell groups was observed (F(2, 2535) = 36.348, p < 0.0001). Applying suppression to only non-target co-tuned cells during the stimulation session yielded a significant response amplitude decrease for co-tuned cells compared to non co-tuned cells (F(1, 2535) = 45.62, p < 0.0001), which resembles the real data In contrast, applying suppression to all non-target cells led to similar amplitude changes in both co-tuned and non co-tuned neurons (F(1, 2535) = 0.87, p = 0.35), which was not observed in either the real data or the simulated data restricted to co-tuned cell suppression. Therefore, the model predicts correctly that the specific suppression given to only co-tuned neurons drove the real data outcome. All of this information is now added into Methods and Results sections and the figure is added as Figure 3C.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We have significant concerns about the eLife assessment and the reviews. The reviewers acknowledged substantial strengths in our work:

      • Reviewer 3 noted that “the single-unit analyses of tuning direction are robustly characterized”, “the differences in neural correlations across behaviors, regions and perturbations are robust”, and “The evidence for these claims is solid.”

      • Reviewer 2 stated that “the manuscript has been improved” with “new analyses [that] provide improved rigor”.

      Despite these, the final eLife assessment inexplicably downplayed the significance of the findings and strength of evidence.

      Broader Impact and Significance. The findings, not only the data, have theoretical and/or practical implications extending well beyond a single subfield relevant to:

      1. behavioral neuroscientists studying sensorimotor integration

      2. systems and theoretical neuroscientists

      3. neural and biomechanical engineers working on brain-computer interfaces for speech or oral or limb prosthetics

      4. soft robotics researchers

      5. comparative motor control researchers

      6. clinicians involved in the evaluation and rehabilitation of orolingual function (e.g., after stroke or glossectomy, dysphagia)

      Given this broad relevance, we question why the significance was characterized as merely "useful" rather than "important."

      Dismissive Tone Toward Descriptive Research. Some reviews displayed a dismissive or skeptical tone of the findings and their significance, even when methods were solid and support for the claims were strong. They critiqued the “descriptive nature” of our study, faulting the lack of mechanistic explanation. However, in poorly understood fields such as orofacial sensorimotor control, descriptive studies provide the empirical foundation for mechanistic studies. Rich descriptive data generate testable hypotheses that drive mechanistic discoveries forward, while mechanistic studies conducted without this groundwork often pursue precise answers to poorly formulated questions.

      Specific Issues with Reviews:

      1. Significant omission in study description:

      The eLife Assessment’s second sentence states: “The data, which include both electrophysiology and nerve block manipulations, will be of value to neuroscientists and

      neural engineers interested in tongue use.”

      This description omits our simultaneously recorded high-resolution 3D kinematics data—a significant oversight given that combining high-density electrophysiological recording from multiple cortical regions with high-resolution 3D tongue kinematics during naturalistic behaviors in non-human primates represents one of our study's key strengths. Currently, only two research labs in the US possess this capability.

      2. Overemphasis on the “smaller” and “inconsistent” findings

      While we acknowledge some inconsistent findings between animals, the reviews overemphasized these inconsistencies in ways that cast unwarranted doubt on our more significant and consistent results.

      a. Reviewer 1: “[...] the discrepancies in tuning changes across the two NHPs, coupled with the overall exploratory nature of the study, render the interpretation of these subtle differences somewhat speculative. “[...] in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which seemed to result in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.”

      The skeptical tone of the critique is in opposition to Reviewer 3’s statement that: “the evidence for these claims were solid”. In this statement, the reviewer characterized our findings as “somewhat speculative”, seemingly overlooking robust and consistent changes we documented:

      • “Following nerve block, MIo and SIo showed significant decreases in the proportion of directionally modulated neurons across both tasks (Fig. 10A; Chi-square, MIo: p <0.001, SIo: p < 0.05).”

      • “Nerve block significantly altered PD distributions during both tasks. During feeding, MIo neurons in both subjects exhibited a significant clockwise shift in mean PD toward the center (0°), resulting in more uniform distributions (Fig. 11A; circular k-test, p < 0.01).”

      These results were obtained through careful subsampling of trials with similar kinematics for both feeding and drinking tasks, ensuring that the tuning changes in the nerve block experiments could not be attributed to differing kinematics.

      b. Reviewer 2: “One weakness of the current study is that there is substantial variability in results between monkeys.”

      This vague critique, without specifying which results showed “substantial variability”, reads as though most findings were inconsistent, unfairly casting doubt on our study’s validity.

      3. Inaccurate statements in the Reviewers’ summaries

      Several reviewer statements contain factual inaccuracies:

      a. Reviewer 2: “A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning).”

      Reviewer 2's characterization of directional tuning misrepresents our findings. We reported substantial differences in the proportion of directionally tuned neurons between MIo and SIo during the feeding task but a smaller difference in the drinking task:

      • “The proportion of directionally tuned neurons [...] differed significantly between MIo and SIo during the feeding task in both subjects (Chi-square, p < 0.001). In rostral and caudal MIo, 80% of neurons were modulated to 3D direction (bootstrap, p < 0.05, Fig. 3B, left), compared to 52% in areas 1/2 and 3a/3b.

      • “During drinking, the proportion of directionally modulated neurons was more similar between regions (69% in MIo vs. 60% in SIo: Chi-square, p > 0.05, Fig. 3B right).”

      b. Reviewer 2: “There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking.”

      Reviewer 2's claim about task differences directly contradicts our findings. We consistently reported stronger tuning in feeding compared to drinking across multiple measures:

      • “The proportion of directionally tuned neurons was higher in the feeding vs. drinking task (Chi-square, p < 0.05, feeding: 72%, drinking: 66%)”;

      • “Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%)”;

      • “Decoding using LSTM showed consistently higher accuracies in feeding compared to drinking regardless of the length of intervals used ..., behavioral window .., and directional angles ...”

      These results were also summarized in the Discussion.

      c. Reviewer 1: In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      Reviewer 1’s observation about Figure 12 is incorrect. Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo). We plotted the two latent factors with highest explained variance for clarity, though all 20 factors were included in intertrajectory distance calculations.

      4. Framing and interpretive over-scrutiny

      Several critiques targeted framing rather than methodological rigor and emphasized that interpretations were speculative even when appropriately hedged:

      a. Reviewer 2: “A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.”

      Reviewer 2 mentioned "inconsistent use of quantifications/statistics" without specifying which analyses were problematic or updating their summary to include our additional population-level findings.

      b. Reviewer 2: “The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results”

      Despite our addressing kinematic concerns through subsampled data analysis, Reviewer 2 remained unsatisfied, contrasting sharply with Reviewer 3's assessment that our arguments were "convincing" with "solid" evidence.

      c. Reviewer 2: “I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw”

      Reviewer 2 expressed skepticism about fundamental encoding differences between tasks, despite our comprehensive controls including subsampled data with similar kinematics and multiple verification analyses (equal neuron numbers, stable neurons, various interval lengths, behavioral windows, and directional angles).

      Without describing why these analyses were insufficient, this criticism goes beyond methods or statistics. It casts doubt and challenges whether the conclusions are even worth drawing despite careful experimental controls.

      d. Reviewer 2: “The manuscript states that "An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer".

      By not updating this section, Reviewer 2 failed to acknowledge our responsive revisions, including Fano factor analysis showing higher variability in SIo during feeding versus drinking, and our updated discussion addressing their concerns about trial-to-trial variability: “Varying tongue shape, tongue’s contact with varying bolus properties (size and texture) and other oral structures (palate, teeth) may weaken the directional signal contained in SIo activity. Thus, small differences in tongue kinematics might create large differences in sensory signals across trials. When looking at trial-averaged signals, this natural variability could make the neural response patterns appear less precise or specific than they are. These are consistent with our findings that for both tasks, spiking variability was higher in SIo.”

      Authors’ Response to Recommendations for the authors:

      We thank the editors and the reviewers for their helpful comments. We have provided a response to reviewers’ recommendations and made some revisions on the manuscript. 

      Reviewer #1 (Recommendations for the authors): 

      In the newly added population factor analysis, several methodological decisions remain unclear to me:

      In Figure 7, why do the authors compare the mean distance between conditions in the latent spaces of MIo and SIo? Since these latent spaces are derived separately, they exist on different scales (with MIo appearing roughly four times larger than SIo), and this discrepancy is reflected in the reported mean distances (Figure 7, inset plots). Wouldn't this undermine a direct comparison?

      Thank you for this helpful feedback. The reviewer is correct that the latent spaces are derived separately for MIo and SIo, thus they exist on different scales as we have noted in the caption of Figure 7: “Axes for SIo are 1/4 scale of MIo.” 

      To allow for a direct comparison between MIo and SIo, we corrected the analysis by comparing their normalized mean inter-trajectory distances obtained by first calculating the geometric index (GI) of the inter-trajectory distances, d, between each pair of population trajectories per region as: GI= (d<sub>1</sub>-d<sub>2</sub>)/ (d<sub>1</sub>+d<sub>2</sub>). We then performed the statistics on the GIs and found a significant difference between mean inter-trajectory distances in MIo vs. SIo. We performed the same analysis comparing the distance travelled between MIo and SIo trajectories by getting the normalized difference in distances travelled and still found a significant difference in both tasks. We have updated the results and figure inset to reflect these changes.

      In Figure 12, unlike Figure 7 which shows three latent dimensions, only two factors are plotted. While the methods section describes a procedure for selecting the optimal number of latent factors, Figure 7 - figure supplement 3 shows that variance explained continues to increase up to about five latent dimensions across all areas. Why, then, are fewer dimensions shown?

      Thank you for the opportunity to clarify the figure. The m obtained from the 3-fold crossvalidation varied for the full sample and was 20 factors for the subsample. We clarify that all statistical analyses were done using 20 latent factors. Using the full sample of neurons, the first 3 factors explained 81% of variance in feeding data compared to 71% in drinking data. When extended to 5 factors, feeding maintained its advantage with 91% variance explained versus 82% for drinking. Because feeding showed higher variance explained than drinking across 3 or 5 factors, only three factors were shown in Figure 7 for better visualization. We added this clarification to the Methods and Results.

      Figure 12 shows the differences in the neural trajectories between the control and nerve block conditions. The control vs. nerve block comparison complicated the visualization of the results. Thus, we plotted only the two latent factors with the highest separation between population trajectories. This was clarified in the Methods and caption of Figure 12.

      In Figure 12, factor 2 and 3 are plotted against each other? and factor 1 is left out?

      This observation is incorrect; Factor 1 was included: Top subplots (feeding) show Factor 1 vs 3 (MIo) and Factor 1 vs 2 (SIo) while the bottom subplots (drinking) show Factor 2 vs 3 (MIo) and Factor 1 vs 2 (SIo).  We have clarified this in the Methods and caption of Figure 12.

      Finally, why are factor analysis results shown only for monkey R? 

      Factor analysis results were performed on both animals, but the results were shown only for monkey R to decrease the number of figures in the manuscript. Figure 7- figure supplement 1 shows the data for both monkeys. Here are the equivalent Figure 7 plots for monkey Y. 

      Author response image 1.

      Reviewer #2 (Recommendations for the authors): 

      Overall, the manuscript has been improved. 

      New analyses provide improved rigor (as just one example, organizing the feeding data into three-category split to better match the three-direction drinking data decoding analysis and also matching the neuron counts).

      The updated nerve block change method (using an equal number of trials with a similar leftright angle of movement in the last 100 ms of the tongue trajectory) somewhat reduces my concern that kinematic differences could account for the neural changes, but on the other hand the neural analyses use 250 ms (meaning that the neural differences could be related to behavioral differences earlier in the trial). Why not subselect to trials with similar trajectories throughout the whole movement(or at least show that as an additional analysis, albeit one with lower trial counts). 

      As the reviewer pointed out, selecting similar trajectories throughout the whole movement would result in lower trial counts that lead to poor statistical power. We think that the 100 ms prior to maximum tongue protrusion is a more important movement segment to control for similar kinematics between the control and nerve block conditions since this represents the subject’s intended movement endpoint. 

      A lot of the Results seemed like a list of measurements without sufficient hand-holding or guide-posting to explain what the take-away for the reader should be. Just one example to make concrete this broadly-applicable feedback: "Cumulative explained variance for the first three factors was higher in feeding (MIo: 82%, SIo: 81%) than in drinking (MIo: 74%, SIo: 63%) when all neurons were used for the factor analysis (Fig. 7)": why should we care about 3 factors specifically? Does this mean that in feeding, the neural dimensionality is lower (since 3 factors explain more of it)? Does that mean feeding is a "simpler" behavior (which is counter-intuitive and does not conform to the authors' comments about the higher complexity of feeding). And from later in that paragraph: what are we do make of the differences in neural trajectory distances (aside from quantifying using a different metric the same larger changes in firing rates that could just as well be quantified as statistics across single-neuron PETHs)?

      Thank you for the feedback on the writing style. We have made some revisions to describe the takeaway for the reader. That fewer latent factors explain 80% of the variance in the feeding data means that the underlying network activity is relatively simple despite apparent complexity. When neural population trajectories are farther away from each other in state space, it means that the patterns of activity across tongue directions are more distinct and separable, thus, less likely to be confused with each other. This signifies that neural representations of 3D tongue directions are more robust. When there is better neural discrimination and more reliable information processing, it is easier for downstream brain regions to distinguish between different tongue directions.  

      The addition of more population-level analyses is nice as it provides a more efficient summary of the neural measurements. However, it's a surface-level dive into these methods; ultimately the goal of ensemble "computation through dynamics" analyses is to discover simpler structure / organizational principles at the ensemble level (i.e., show things not evidence from single neurons), rather than just using them as a way to summarize data. For instance, here neural rotations are remarked upon in the Results, without referencing influential prior work describing such rotations and why neural circuits may use this computational motif to separate out conditions and shape muscle activity-generating readouts (Churchland et al. Nature 2012 and subsequent theoretical iterations including the Russo et al.). That said, the Russo et al tangling study was well-referenced and the present tangling results were eGectively contextualized with respect to that paper in terms of the interpretation. I wish more of the results were interpreted with comparable depth. 

      Speaking of Russo et al: the authors note qualitative differences in tangling between brain areas, but do not actually quantify tangling in either. These observations would be stronger if quantified and accompanied with statistics.

      Contrary to the reviewer’s critique, we did frame these results in the context of structure/organizational principles at the ensemble level. We had already cited prior work of Churchland et al., 2012; Michaels et al., 2016and Russo et al., 2018. In the Discussion, Differences across behaviors, we wrote: “In contrast, MIo trajectories in drinking exhibited a consistent rotational direction regardless of spout location (Fig. 7). This may reflect a predominant non-directional information such as condition-independent time-varying spiking activity during drinking (Kaufman et al., 2016; Kobak et al., 2016; Arce-McShane et al., 2023).” 

      Minor suggestions: 

      Some typos, e.g. 

      • no opening parenthesis in "We quantified directional differences in population activity by calculating the Euclidean distance over m latent factors)"

      • missing space in "independent neurons(Santhanam et al., 2009;..."); 

      • missing closing parentheses in "followed by the Posterior Inferior (Figure 3 - figure supplement 1."

      There is a one-page long paragraph in the Discussion. Please consider breaking up the text into more paragraphs each organized around one key idea to aid readability.

      Thank you, we have corrected these typos.

      Could it be that the Kaufman et al 2013 reference was intended to be Kaufman et al 2015 eNeuro (the condition-invariant signal paper)?

      Thank you, we have corrected this reference.

      At the end of the Clinical Implications subsection of the Discussion, the authors note the growing field of brain-computer interfaces with references for motor read-out or sensory write-in of hand motor/sensory cortices, respectively. Given that this study looks at orofacial cortices, an even more clinically relevant development is the more recent progress in speech BCIs (two     recent reviews: https://www.nature.com/articles/s41583-024-00819-9, https://www.annualreviews.org/content/journals/10.1146/annurev-bioeng-110122012818) many of which record from human ventral motor cortex and aspirations towards FES-like approaches for orofacial movements (e.g., https://link.springer.com/article/10.1186/s12984-023-01272-y).  

      Thank you, we have included these references.

      Reviewer #3 (Recommendations for the authors): 

      Major Suggestions 

      (1) For the factor analysis of feeding vs licking, it appears that the factors were calculated separately for the two behaviors. It could be informative to calculate the factors under both conditions and project the neural data for the two behaviors into that space. The overlap/separations of the subspace could be informative. 

      We clarify that we performed a factor analysis that included both feeding and licking for MIo, as stated in the Results: “To control for factors such as different neurons and kinematics that might influence the results, we performed factor analysis on stable neurons across both tasks using all trials (Fig. 7- figure supplement 2A) and using trials with similar kinematics (Fig. 7- figure supplement 2B).” We have revised the manuscript to reflect this more clearly.

      (2) For the LSTM, the Factor analyses and the decoding it is unclear if the firing rates are mean subtracted and being normalized (the methods section was a little unclear). Typically, papers in the field either z-score the data or do a softmax.

      The firing rates were z-scored for the LSTM and KNN. For the factor analysis, the spike counts were not z-scored, but the results were normalized. We clarified this in the Methods section.

      Minor: 

      Page 1: Abstract- '... how OSMCx contributes to...' 

      Since there are no direct causal manipulations of OSMCx in this manuscript, this study doesn't directly study the OSMCx's contribution to movement - I would recommend rewording this sentence.

      Similarly, Page 2: 'OSMCx plays an important role in coordination...' the citations in this paragraph are correlative, and do not demonstrate a causal role.

      There are similar usages of 'OSMCx coordinates...' in other places e.g. Page 8. 

      Thank you, we revised these sentences.

      Page 7: the LSTM here has 400 units, which is a very large network and contains >12000 parameters. Networks of this size are prone to memorization, it would be wise to test the rsquare of the validation set against a shuGled dataset to see if the network is actually working as intended. 

      Thank you for bringing up this important point of verifying that the network is learning meaningful patterns versus memorizing. Considering the size of our training samples, the ratio of samples to parameters is appropriate and thus the risk of memorization is low. Indeed, validation tests and cross-validation performed indicated expected network behavior and the R squared values obtained here were similar to those reported in our previous paper (Laurence-Chasen et al., 2023).


      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their paper, Hosack and Arce-McShane investigate how the 3D movement direction of the tongue is represented in the orofacial part of the sensory-motor cortex and how this representation changes with the loss of oral sensation. They examine the firing patterns of neurons in the orofacial parts of the primary motor cortex (MIo) and somatosensory cortex (SIo) in non-human primates (NHPs) during drinking and feeding tasks. While recording neural activity, they also tracked the kinematics of tongue movement using biplanar videoradiography of markers implanted in the tongue. Their findings indicate that most units in both MIo and SIo are directionally tuned during the drinking task. However, during the feeding task, directional turning was more frequent in MIo units and less prominent in SIo units. Additionally, in some recording sessions, they blocked sensory feedback using bilateral nerve block injections, which resulted in fewer directionally tuned units and changes in the overall distribution of the preferred direction of the units.

      Strengths:

      The most significant strength of this paper lies in its unique combination of experimental tools. The author utilized a video-radiography method to capture 3D kinematics of the tongue movement during two behavioral tasks while simultaneously recording activity from two brain areas. Moreover, they employed a nerve-blocking procedure to halt sensory feedback. This specific dataset and experimental setup hold great potential for future research on the understudied orofacial segment of the sensory-motor area.

      Weaknesses:

      Aside from the last part of the result section, the majority of the analyses in this paper are focused on single units. I understand the need to characterize the number of single units that directly code for external variables like movement direction, especially for less-studied areas like the orofacial part of the sensory-motor cortex. However, as a field, our decadelong experience in the arm region of sensory-motor cortices suggests that many of the idiosyncratic behaviors of single units can be better understood when the neural activity is studied at the level of the state space of the population. By doing so, for the arm region, we were able to explain why units have "mixed selectivity" for external variables, why the tuning of units changes in the planning and execution phase of the movement, why activity in the planning phase does not lead to undesired muscle activity, etc. See (Gallego et al. 2017; Vyas et al. 2020; Churchland and Shenoy 2024) for a review. Therefore, I believe investigating the dynamics of the population activity in orofacial regions can similarly help the reader go beyond the peculiarities of single units and in a broader view, inform us if the same principles found in the arm region can be generalized to other segments of sensorymotor cortex.

      We thank and agree with the reviewer on the value of information gained from studying population activity. We also appreciate that population analyses have led to the understanding that individual neurons have “mixed selectivity”. We have shown previously that OSMCx neurons exhibit mixed selectivity in their population activity and clear separation between latent factors associated with gape and bite force levels (Arce-McShane FI, Sessle BJ, Ram Y, Ross CF, Hatsopoulos NG (2023) Multiple regions of primate orofacial sensorimotor cortex encode bite force and gape. Front Systems Neurosci. doi: 10.3389/fnsys.2023.1213279. PMID: 37808467 PMCID: 10556252), and chew-side and food types (Li Z & Arce-McShane FI (2023). Cortical representation of mastication in the primate orofacial sensorimotor cortex. Program No. NANO06.05. 2023 Neuroscience Meeting Planner. Washington, D.C.: Society for Neuroscience, 2023. Online.). 

      The primary goal of this paper was to characterize single units in the orofacial region and to do a follow-up paper on population activity. In the revised manuscript, we have now incorporated the results of population-level analyses. The combined results of the single unit and population analyses provide a deeper understanding of the cortical representation of 3D direction of tongue movements during natural feeding and drinking behaviors. 

      Further, for the nerve-blocking experiments, the authors demonstrate that the lack of sensory feedback severely alters how the movement is executed at the level of behavior and neural activity. However, I had a hard time interpreting these results since any change in neural activity after blocking the orofacial nerves could be due to either the lack of the sensory signal or, as the authors suggest, due to the NHPs executing a different movement to compensate for the lack of sensory information or the combination of both of these factors. Hence, it would be helpful to know if the authors have any hint in the data that can tease apart these factors. For example, analyzing a subset of nerve-blocked trials that have similar kinematics to the control.

      Thank you for bringing this important point. We agree with the reviewer that any change in the neural activity may be attributed to lack of sensory signal or to compensatory changes or a combination of these factors. To tease apart these factors, we sampled an equal number of trials with similar kinematics for both control and nerve block feeding sessions. We added clarifying description of this approach in the Results section of the revised manuscript: “To confirm this e ect was not merely due to altered kinematics, we conducted parallel analyses using carefully subsampled trials with matched kinematic profiles from both control and nerve-blocked conditions.”

      Furthermore, we ran additional analysis for the drinking datasets by subsampling a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. We compared the directional tuning across an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. These analyses that control for similar kinematics showed that there was still a decrease in the proportion of directionally modulated neurons with nerve block compared to the control. This confirms that the results may be attributed to the lack of tactile information. These are now integrated in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directional tuning of MIo and SIo neurons and Figure 10 – figure supplement 1.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulations depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).

      • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.

      • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during licking. This potentially suggests behavioral context-dependent encoding.

      • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods, especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations for some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in results between monkeys, and that only one session of data per monkey/condition is analyzed (8 sessions total). This raises the concern that the results could be idiosyncratic. The Methods mention that other datasets were collected, but not analyzed because the imaging pre-processing is very labor-intensive. While I recognize that time is precious, I do think in this case the manuscript would be substantially strengthened by showing that the results are similar on other sessions.

      We acknowledge the reviewer’s concern about inter-subject variability. Animal feeding and drinking behaviors are quite stable across sessions, thus, we do not think that additional sessions will address the concern that the results could be idiosyncratic. Each of the eight datasets analyzed here have su icient neural and kinematic data to capture neural and behavioral patterns.  Nevertheless, we performed some of the analyses on a second feeding dataset from Monkey R. The results from analyses on a subset of this data were consistent across datasets; for example, (1) similar proportions of directionally tuned neurons, (2) similar distances between population trajectories (t-test p > 0.9), and (3) a consistently smaller distance between Anterior-Posterior pairs than others in MIo (t-test p < 0.05) but not SIo (p > 0.1). 

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first-order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways, an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefits from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75

      Thank you for highlighting this important point. Research on orofacial movements hasn't progressed at the same pace as limb movement studies. Our manuscript focused specifically on characterizing the 3D directional tuning properties of individual neurons in the orofacial area—an analysis that has not been conducted previously for orofacial sensorimotor control. While we initially prioritized this individual neuron analysis, we recognize the value of broader population-level insights.

      Based on your helpful feedback, we have incorporated additional population analyses to provide a more comprehensive picture of orofacial sensorimotor control and expanded our discussion section. We appreciate your expertise in pushing our work to be more thorough and aligned with current neuroscience approaches.

      Can the authors explain (or at least speculate) why there was such a large difference in behavioral e ect due to nerve block between the two monkeys (Figure 7)?

      We acknowledge this as a variable inherent to this type of experimentation. Previous studies have found large kinematic variation in the effect of oral nerve block as well as in the following compensatory strategies between subjects. Each animal’s biology and response to perturbation vary naturally. Indeed, our subjects exhibited different feeding behavior even in the absence of nerve block perturbation (see Figure 2 in Laurence-Chasen et al., 2022). This is why each individual serves as its own control.

      Do the analyses showing a decrease in tuning after nerve block take into account the changes (and sometimes reduction in variability) of the kinematics between these conditions? In other words, if you subsampled trials to have similar distributions of kinematics between Control and Block conditions, does the effect hold true? The extreme scenario to illustrate my concern is that if Block conditions resulted in all identical movements (which of course they don't), the tuning analysis would find no tuned neurons. The lack of change in decoding accuracy is another yellow flag that there may be a methodological explanation for the decreased tuning result.

      Thank you for bringing up this point. We accounted for the changes in the variability of the kinematics between the control and nerve block conditions in the feeding dataset where we sampled an equal number of trials with similar kinematics for both control and nerve block. However, we did not control for similar kinematics in the drinking task. In the revised manuscript, we have clarified this and performed similar analysis for the drinking task. We sampled a similar distribution of drinking movements from each condition. We compared the neural data from an equal number of trials with a similar left-right angle of movement in the last 100 ms of the tongue trajectory, nearest the spout. There was a decrease in the percentage of neurons that were directionally modulated (between 30 and 80%) with nerve block compared to the control. These results have been included in the revised paper under Methods section: Directional tuning of single neurons, as well as Results section: E ects of nerve block: Decreased directionality of MIo and SIo neurons.

      While the results from decoding using KNN did not show significant differences between decoding accuracies in control vs. nerve block conditions, the results from the additional factor analysis and decoding using LSTM were consistent with the decrease in directional tuning at the level of individual neurons.  

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". Could an alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somato sensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer.

      Thank you for bringing up this point. We have now incorporated this in our revised Discussion (see Comparison between MIo and SIo). We agree with the reviewer that trialby-trial variability in the a erent signals may account for the lower directional signal in SIo during feeding than in drinking. Indeed, SIo’s mean-matched Fano factor in feeding was significantly higher than those in drinking (Author response image 1). Moreover, the results of the additional population and decoding analyses also support this.  

      Author response image 1.

      Comparison of mean-matched Fano Factor between Sio neurons during feeding and drinking control tasks across both subjects (Wilcoxon rank sum test, p < 0.001).

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors aim to uncover how 3D tongue direction is represented in the Motor (M1o) and Somatosensory (S1o) cortex. In non-human primates implanted with chronic electrode arrays, they use X-ray-based imaging to track the kinematics of the tongue and jaw as the animal is either chewing food or licking from a spout. They then correlate the tongue kinematics with the recorded neural activity. Using linear regressions, they characterize the tuning properties and distributions of the recorded population during feeding and licking. Then, they recharacterize the tuning properties after bilateral lidocaine injections in the two sensory branches of the trigeminal nerve. They report that their nerve block causes a reorganization of the tuning properties. Overall, this paper concludes that M1o and S1o both contain representations of the tongue direction, but their numbers, their tuning properties, and susceptibility to perturbed sensory input are different.

      Strengths:

      The major strengths of this paper are in the state-of-the-art experimental methods employed to collect the electrophysiological and kinematic data.

      Weaknesses:

      However, this paper has a number of weaknesses in the analysis of this data.

      It is unclear how reliable the neural responses are to the stimuli. The trial-by-trial variability of the neural firing rates is not reported. Thus, it is unclear if the methods used for establishing that a neuron is modulated and tuned to a direction are susceptible to spurious correlations. The authors do not use shuffling or bootstrapping tests to determine the robustness of their fits or determining the 'preferred direction' of the neurons. This weakness colors the rest of the paper.

      Thank you for raising these points. We have performed the following additional analyses: (1) We have added analyses to ensure that the results could not be explained by neural variability. To show the trial-by-trial variability of the neural firing rates, we have calculated the Fano factor (mean overall = 1.34747; control = 1.46471; nerve block = 1.23023). The distribution was similar across directions, suggesting that responses of MIo and SIo neurons to varying 3D directions were reliable. (2) We have used a bootstrap procedure to ensure that directional tuning cannot be explained by mere chance. (3) To test the robustness of our PDs we also performed a bootstrap test, which yielded the same results for >90% of neurons, and a multiple linear regression test for fit to a cosine-tuning function. In the revised manuscript, the Methods and Results sections have been updated to include these analyses.  

      Author response image 2.

      Comparison of Fano Factor across directions for MIo and SIo Feeding Control (Kruskal-Wallis, p > 0.7).

      The authors compare the tuning properties during feeding to those during licking but only focus on the tongue-tip. However, the two behaviors are different also in their engagement of the jaw muscles. Thus many of the differences observed between the two 'tasks' might have very little to do with an alternation in the properties of the neural code - and more to do with the differences in the movements involved. 

      Using the tongue tip for the kinematic analysis of tongue directional movements was a deliberate choice as the anterior region of the tongue is highly mobile and sensitive due to a higher density of mechanoreceptors. The tongue tip is the first region that touches the spout in the drinking task and moves the food into the oral cavity for chewing and subsequent swallowing. 

      We agree with the reviewer that the jaw muscles are engaged differently in feeding vs. drinking (see Fig. 2). For example, a wider variety of jaw movements along the three axes are observed in feeding compared to the smaller amplitude and mostly vertical jaw movements in drinking. Also, the tongue movements are very different between the two behaviors. In feeding, the tongue moves in varied directions to position the food between left-right tooth rows during chewing, whereas in the drinking task, the tongue moves to discrete locations to receive the juice reward. Moreover, the tongue-jaw coordination differs between tasks; maximum tongue protrusion coincides with maximum gape in drinking but with minimum gape in the feeding behavior. Thus, the different tongue and jaw movements required in each behavior may account for some of the differences observed in the directional tuning properties of individual neurons and population activity. These points have been included in the revised Discussion.

      Author response image 3.

      Tongue tip position (mm) and jaw pitch(degree) during feeding (left) and drinking (right) behaviors. Most protruded tongue position coincides with minimum gape (jaw pitch at 0°) during  feeding but with maximum gape during drinking.

      Many of the neurons are likely correlated with both Jaw movements and tongue movements - this complicates the interpretations and raises the possibility that the differences in tuning properties across tasks are trivial.

      We thank the reviewer for raising this important point. In fact, we verified in a previous study whether the correlation between the tongue and jaw kinematics might explain differences in the encoding of tongue kinematics and shape in MIo (see Supplementary Fig. 4 in Laurence-Chasen et al., 2023): “Through iterative sampling of sub-regions of the test trials, we found that correlation of tongue kinematic variables with mandibular motion does not account for decoding accuracy. Even at times where tongue motion was completely un-correlated with the jaw, decoding accuracy could be quite high.” 

      The results obtained from population analyses showing distinct properties of population trajectories in feeding vs. drinking behaviors provide strong support to the interpretation that directional information varies between these behaviors.

      The population analyses for decoding are rudimentary and provide very coarse estimates (left, center, or right), it is also unclear what the major takeaways from the population decoding analyses are. The reduced classification accuracy could very well be a consequence of linear models being unable to account for the complexity of feeding movements, while the licking movements are 'simpler' and thus are better accounted for.

      We thank the reviewer for raising this point. The population decoding analyses provide additional insight on the directional information in population activity,  as well as a point of comparison with the results of numerous decoding studies on the arm region of the sensorimotor cortex. In the revised version, we have included the results from decoding tongue direction using a long short-term memory (LSTM) network for sequence-tosequence decoding. These results differed from the KNN results, indicating that a linear model such as KNN was better for drinking and that a non-linear and continuous decoder was better suited for feeding.  These results have been included in the revised manuscript.

      The nature of the nerve block and what sensory pathways are being affected is unclear - the trigeminal nerve contains many different sensory afferents - is there a characterization of how e ectively the nerve impulses are being blocked? Have the authors confirmed or characterized the strength of their inactivation or block, I was unable to find any electrophysiological evidence characterizing the perturbation.

      The strength of the nerve block is characterized by a decrease in the baseline firing rate of SIo neurons, as shown in Supplementary Figure 6 of “Loss of oral sensation impairs feeding performance and consistency of tongue–jaw coordination” (Laurence-Chasen et al., 2022)..

      Overall, while this paper provides a descriptive account of the observed neural correlations and their alteration by perturbation, a synthesis of the observed changes and some insight into neural processing of tongue kinematics would strengthen this paper.

      We thank the reviewer for this suggestion. We have revised the Discussion to provide a synthesis of the results and insights into the neural processing of tongue kinematics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The procedure for anesthesia explained in the method section was not clear to me. The following information was missing: what drug/dose was used? How long the animal was under anesthesia? How long after the recovery the experiments were done?

      The animals were fully sedated with ketamine (100 mg/ml, 10 mg/kg) for less than 30 minutes, and all of the data was collected within 90 minutes after the nerve block was administered.

      (2) In Figure 10, panels A and B are very close together, it was not at first clear whether the text "Monkey R, Monkey Y" belongs to panel A or B.

      We have separated the two panels further in the revised figure.

      (3) I found Figure 11 very busy and hard to interpret. Separating monkeys, fitting the line for each condition, or using a bar plot can help with the readability of the figure.

      Thank you for the suggestion. We agree with you and have reworked this figure. To simplify it we have shown the mean accuracy across iterations.

      (4) I found the laterality discussions like "This signifies that there are more neurons in the left hemisphere contributes toward one direction of tongue movement, suggesting that there is some laterality in the PDs of OSMCx neurons that varies between individuals" bit of an over-interpretation of data, given the low n value and the dissimilarity in how strongly the nerve blocking altered monkies behavior.

      Thank you for sharing this viewpoint. We do think that laterality is a good point of comparison with studies on M1 neurons in the arm/hand region. In our study, we found that the peak of the PD distribution coincides with leftward tongue movements in feeding. The distribution of PDs provides insight into how tongue muscles are coordinated during movement. Intrinsic and extrinsic tongue muscles are involved in shaping the tongue (e.g., elongation, broadening) and positioning the tongue (e.g., protrusion/retraction, elevation/depression), respectively. These muscles receive bilateral motor innervation except for genioglossus. Straight tongue protrusion requires the balanced action of the right and left genioglossi while the lateral protrusion involves primarily the contralateral genioglossus. Given this unilateral innervation pattern, we hypothesized that left MIo/SIo neurons would preferentially respond to leftward tongue movements, corresponding to right genioglossus activation. 

      Reviewer #2 (Recommendations for the authors):

      Are the observation of tuning peaks being most frequently observed toward the anterior and superior directions consistent with the statistics of the movements the tongue typically makes? This could be analogous to anisotropies previously reported in the arm literature, e.g., Lillicrap TP, Scott SH. 2013. Preference Distributions of Primary Motor Cortex Neurons Reflect Control Solutions Optimized for Limb Biomechanics. Neuron. 77(1):168-79

      Thank you for bringing our attention to analogous findings by Lillicrap & Scott, 2013. Indeed, we do observe the highest number of movements in the Anterior Superior directions, followed by the Posterior Inferior. This does align with the distribution of tuning peaks that we observed. Author response image 4 shows the proportions of observed movements in each group of directions across all feeding datasets. We have incorporated this data in the Results section: Neuronal modulation patterns differ between MIo and SIo, as well as added this point in the Discussion.

      Author response image 4.

      Proportion of feeding trials in each group of directions. Error bars represent ±1 standard deviation across datasets (n = 4).

      "The Euclidean distance was used to identify nearest neighbors, and the number of nearest neighbors used was K = 7. This K value was determined after testing different Ks which yielded comparable results." In general, it's a decoding best practice to tune hyperparameters (like K) on fully held-out data from the data used for evaluation. Otherwise, this tends to slightly inflate performance because one picks the hyperparameter that happened to give the best result. It sounds like that held-out validation set wasn't used here. I don't think that's going to change the results much at all (especially given the "comparable results" comment), but providing this suggestion for the future. If the authors replicate results on other datasets, I suggest they keep K = 7 to lock in the method.

      K = 7 was chosen based on the size of our smallest training dataset (n = 55). The purpose of testing different K values was not to select which value gave the best result, but to demonstrate that similar K values did not affect the results significantly. We tested the different K values on a subset of the feeding data, but that data was not fully held-out from the training set. We will keep your suggestion in mind for future analysis.

      The smoothing applied to Figure 2 PSTHs appears perhaps excessive (i.e., it may be obscuring interesting finer-grained details of these fast movements). Can the authors reduce the 50 ms Gaussian smoothing (I assume this is the s.d.?) ~25 ms is often used in studying arm kinematics. It also looks like the movement-related modulation may not be finished in these 200 ms / 500 ms windows. I suggest extending the shown time window. It would also be helpful to show some trial-averaged behavior (e.g. speed or % displacement from start) under or behind the PSTHs, to give a sense of what phase of the movement the neural activity corresponds to.

      Thank you for the suggestion. We have taken your suggestions into consideration and modified Figure 2 accordingly. We decreased the Gaussian kernel to 25 ms and extended the time window shown. The trial-averaged anterior/posterior displacement was also added to the drinking PSTHs.

      Reviewer #3 (Recommendations for the authors):

      The major consideration here is that the data reported for feeding appears to be very similar to that reported in a previous study:

      "Robust cortical encoding of 3D tongue shape during feeding in macaques"

      Are the neurons reported here the same as the ones used in this previous paper? It is deeply concerning that this is not reported anywhere in the methods section.

      These are the same neurons as in our previous paper, though here we include several additional datasets of the nerve block and drinking sessions. We have now included this in the methods section.

      Second, I strongly recommend that the authors consider a thorough rewrite of this manuscript and improve the presentation of the figures. As written, it was not easy to follow the paper, the logic of the experiments, or the specific data being presented in the figures.

      Thank you for this suggestion. We have done an extensive rewrite of the manuscript and revision of the figures.

      A few recommendations:

      (1) Please structure your results sections and use descriptive topic sentences to focus the reader. In the current version, it is unclear what the major point being conveyed for each analysis is.

      Thank you for this suggestion. We have added topic sentences to the begin each section of the results.

      (2) Please show raster plots for at least a few example neurons so that the readers have a sense of what the neural responses look like across trials. Is all of Figure 2 one example neuron or are they different neurons? Error bars for PETH would be useful to show the reliability and robustness of the tuning.

      Figure 2 shows different neurons, one from MIo and one from SIo for each task. There is shading showing ±1 standard error around the line for each direction, however this was a bit difficult to see. In addition to the other changes we have made to these figures, we made the lines smaller and darkened the error bar shading to accentuate this. We also added raster plots corresponding to the same neurons represented in Figure 2 as a supplement.

      (3) Since there are only two data points, I am not sure I understand why the authors have bar graphs and error bars for graphs such as Figure 3B, Figure 5B, etc. How can one have an error bar and means with just 2 data points?

      Those bars represent the standard error of the proportion. We have changed the y-axis label on these figures to make this clearer.

      (4) Results in Figure 6 could be due to differential placement of the electrodes across the animals. How is this being accounted for?

      Yes, this is a possibility which we have mentioned in the discussion. Even with careful placement there is no guarantee to capture a set of neurons with the exact same function in two subjects, as every individual is different. Rather we focus on analyses of data within the same animal. The purpose of Figure 6 is to show the difference between MIo and SIo, and between the two tasks, within the same subject. The more salient result from calculating the preferred direction is that there is a change in the distribution between control and nerve block within the same exact population. Discussions relating to the comparison between individuals are speculative and cannot be confirmed without the inclusion of many more subjects.

      (5) For Figure 7, I would recommend showing the results of the Sham injection in the same figure instead of a supplement.

      Thank you for the suggestion, we have added these results to the figure.

      (6) I think the e ects of the sensory block on the tongue kinematics are underexplored in Figure 7 and Figure 8. The authors could explore the deficits in tongue shape, and the temporal components of the trajectory.

      Some of these effects on feeding have been explored in a previous paper, LaurenceChasen et al., 2022. We performed some additional analyses on changes to kinematics during drinking, including the number of licks per 10 second trial and the length of individual licks. The results of these are included below. We also calculated the difference in the speed of tongue movement during drinking, which generally decreased and exhibited an increase in variance with nerve block (f-test, p < 0.001). However, we have not included these figures in the main paper as they do not inform us about directionality.

      Author response image 5.

      Left halves of hemi-violins (black) are control and right halves (red) are nerve block for an individual. Horizontal black lines represent the mean and horizontal red lines the median. Results of two-tailed t-test and f-test are indicated by asterisks and crosses, respectively: *,† p < 0.05; **,†† p < 0.01; ***,††† p < 0.001.

      (9) In Figures 9 and 10. Are the same neurons being recorded before and after the nerve block? It is unclear if the overall "population" properties are different, or if the properties of individual neurons are changing due to the nerve block.

      Yes, the same neurons are being recorded before and after nerve block. Specifically, Figure 9B shows that the properties of many individual neurons do change due to the nerve block. Differences in the overall population response may be attributed to some of the units having reduced/no activity during the nerve block session.

      Additionally, I recommend that the authors improve their introduction and provide more context to their discussion. Please elaborate on what you think are the main conceptual advances in your study, and place them in the context of the existing literature. By my count, there are 26 citations in this paper, 4 of which are self-citations - clearly, this can be improved upon.

      Thank you for this suggestion. We have done an extensive rewrite of the Introduction and Discussion. We discussed the main conceptual advances in our study and place them in the context of the existing literature.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript by Hosack and Arce-McShane examines the directional tuning of neurons in macaque primary motor (MIo) and somatosensory (SIo) cortex. The neural basis of tongue control is far less studied than, for example, forelimb movements, partly because the tongue's kinematics and kinetics are difficult to measure. A major technical advantage of this study is using biplanar video-radiography, processed with modern motion tracking analysis software, to track the movement of the tongue inside the oral cavity. Compared to prior work, the behaviors are more naturalistic behaviors (feeding and licking water from one of three spouts), although the animals were still head-fixed.

      The study's main findings are that:

      • A majority of neurons in MIo and a (somewhat smaller) percentage of SIo modulated their firing rates during tongue movements, with different modulation depending on the direction of movement (i.e., exhibited directional tuning). Examining the statistics of tuning across neurons, there was anisotropy (e.g., more neurons preferring anterior movement) and a lateral bias in which tongue direction neurons preferred that was consistent with the innervation patterns of tongue control muscles (although with some inconsistency between monkeys).<br /> • Consistent with this encoding, tongue position could be decoded with moderate accuracy even from small ensembles of ~28 neurons.<br /> • There were differences observed in the proportion and extent of directional tuning between the feeding and licking behaviors, with stronger tuning overall during feeding. This potentially suggests behavioral context-dependent encoding.<br /> • The authors then went one step further and used a bilateral nerve block to the sensory inputs (trigeminal nerve) from the tongue. This impaired the precision of tongue movements and resulted in an apparent reduction and change in neural tuning in Mio and SIo.

      Strengths:

      The data are difficult to obtain and appear to have been rigorously measured, and provide a valuable contribution to this under-explored subfield of sensorimotor neuroscience. The analyses adopt well-established methods especially from the arm motor control literature, and represent a natural starting point for characterizing tongue 3D direction tuning.

      Weaknesses:

      There are alternative explanations from some of the interpretations, but those interpretations are described in a way that clearly distinguishes results from interpretations, and readers can make their own assessments. Some of these limitations are described in more detail below.

      One weakness of the current study is that there is substantial variability in some of the results between monkeys, including the tuning characteristics of primary somatosensory cortex neurons during drinking, and the effect of nerve block on tongue movements and the associated changes in single neuron tuning.

      This study focuses on describing directional tuning using the preferred direction (PD) / cosine tuning model popularized by Georgopoulous and colleagues for understanding neural control of arm reaching in the 1980s. This is a reasonable starting point and a decent first order description of neural tuning. However, the arm motor control field has moved far past that viewpoint, and in some ways an over-fixation on static representational encoding models and PDs held that field back for many years. The manuscript benefit from drawing the readers' attention (perhaps in their Discussion) that PDs are a very simple starting point for characterizing how cortical activity relates to kinematics, but that there is likely much richer population-level dynamical structure and that a more mechanistic, control-focused analytical framework may be fruitful. A good review of this evolution in the arm field can be found in Vyas S, Golub MD, Sussillo D, Shenoy K. 2020. Computation Through Neural Population Dynamics. Annual Review of Neuroscience. 43(1):249-75. A revised version of the manuscript incorporates more population-level analyses, but with inconsistent use of quantifications/statistics and without sufficient contextualization of what the reader is to make of these results.

      The described changes in tuning after nerve block could also be explained by changes in kinematics between these conditions, which temper the interpretation of these interesting results.

      I am not convinced of the claim that tongue directional encoding fundamentally changes between drinking and feeding given the dramatically different kinematics and the involvement of other body parts like the jaw (e.g., the reference to Laurence-Chasen et al. 2023 just shows that there is tongue information independent of jaw kinematics, not that jaw movements don't affect these neurons' activities). I also find the nerve block results inconsistent (more tuning in one monkey, less in the other?) and difficult to really learn something fundamental from, besides that neural activity and behavior both change - in various ways - after nerve block (not at all surprising but still good to see measurements of).

      The manuscript states that "Our results suggest that the somatosensory cortex may be less involved than the motor areas during feeding, possibly because it is a more ingrained and stereotyped behavior as opposed to tongue protrusion or drinking tasks". An alternative explanation be more statistical/technical in nature: that during feeding, there will be more variability in exactly what somatosensation afferent signals are being received from trial to trial (because slight differences in kinematics can have large differences in exactly where the tongue is and the where/when/how of what parts of it are touching other parts of the oral cavity)? This variability could "smear out" the apparent tuning using these types of trial-averaged analyses. Given how important proprioception and somatosensation are for not biting the tongue or choking, the speculation that somatosensory cortical activity is suppressed during feedback is very counter-intuitive to this reviewer. In the revised manuscript the authors note these potential confounds and other limitations in the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript.

      We appreciate the Editorial assessment on our paper’s strengths and novelty. We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning. Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning.

      Strengths:

      The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these socalled micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%.

      We have previously showed that neural replay of MEG activity representing the practiced skill was prominent during rest intervals of early learning, and that the replay density correlated with micro-offline gains (Buch et al., 2021). These findings are consistent with recent reports (from two different research groups) that hippocampal ripple density increases during these inter-practice rest periods, and predict offline learning gains (Chen et al., 2024; Sjøgård et al., 2024). However, decoder performance in our earlier work (Buch et al., 2021) left room for improvement. Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses:

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions.

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while online monitoring of head position was not performed for this study, it was assessed at the beginning and at the end of each recording. The head was restrained with an inflatable air bladder, and head movement between the beginning and end of each scan did not exceed 5mm for all participants included in the study.

      The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. We agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. However, such correlations between small head movements and finger movements could only meaningfully contribute to decoding performance if: (A) they were consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) they systematically varied between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is unlikely. Alternatively, for this task design a much more likely confound could be the contribution of eye movement artefacts to the decoder performance (an issue raised by Reviewer #3 in the comments below).

      Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may generate eye movements that are systematically related to the task. Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (triggered by a KeyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts) (end of figure legend).

      Remember that the task display does not provide explicit feedback related to performance, only information about the present position in the sequence. Thus, it is possible that participants did not actively attend to the feedback. In fact, inspection of the eye position data revealed that on majority of trials, participants displayed random-walk-like gaze patterns around a central fixation point located near the center of the screen. Thus, participants did not attend to the asterisk position on the display, but instead intrinsically generated the action sequence. A similar realworld example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks) as provided in the study task – feedback which is typically ignored by the user.

      The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued. The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals.(Buch et al., 2021; Classen et al., 1998; Karni et al., 1995; Kleim et al., 1998) Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known. Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported (Doyon et al., 2002; Grafton et al., 1992; Hardwick et al., 2013; Kennerley et al., 2004; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001), and appears to be even more prominent during early fine motor skill learning in the non-dominant hand (Lee et al., 2019; Sawamura et al., 2019). The frontal regions identified in these studies are known to play crucial roles in executive control (Battaglia-Mayer & Caminiti, 2019), motor planning (Toni, Thoenissen, et al., 2001), and working memory (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998) processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations (Andersen & Buneo, 2002; Buneo & Andersen, 2006; Shadmehr & Holcomb, 1997; Toni, Ramnani, et al., 2001; Wolpert et al., 1998), in addition to working memory (Grover et al., 2022). Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task. We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular for the following reasons. First, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications (Srinivas et al., 2016). One could also view this hybrid-space decoding approach as a spatial analogue to common timefrequency based analyses such as theta-gamma phase amplitude coupling (θ/γ PAC), which assess interactions between two or more narrow-band spectral features derived from the same time-series data (Lisman & Jensen, 2013).

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (Hybrid<sub>Alt</sub>) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (Hybrid<sub>Orig</sub>). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± 7.03% SD for Hybrid<sub>Orig</sub> vs. 75.49% ± 7.17% for Hybrid<sub>Alt</sub>; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04; Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. Hybrid<sub>Alt</sub>: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. Hybrid<sub>Orig</sub>: Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that Hybrid<sub>Orig</sub> (the approach used in our manuscript) significantly outperforms the Hybrid<sub>Alt</sub> approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns (end of figure legend).

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen.

      We agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated, an important confound in connectivity analyses (Colclough et al., 2015; Colclough et al., 2016), not performed in our investigation.

      In our study, correlations between adjacent voxels effectively reduce the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. – the rank is greater than 1), the intra-parcel spatial patterns could meaningfully contribute to the decoder performance, as shown by the following results:

      First, we obtained higher decoding accuracy with voxel-space features (74.51% ± 7.34% SD) compared to parcel space features (68.77% ± 7.6%; Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel space features. Second, individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding shows that correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside within.

      Author response image 3.:

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding (end of figure legend).

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment.

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics (Bansal et al., 2011; Mollazadeh et al., 2011) muscle activation patterns (Flint et al., 2012) and temporal sequencing (Churchland et al., 2012) during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies) (Heusser et al., 2016). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions".

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans assessed changes in functional connectivity patterns while participants performed a similar sequence learning task to our present study (Bassett et al., 2011). Using a dynamic network analysis approach, Bassett et al. showed that flexibility in the composition of individual network modules (i.e. – changes in functional brain region membership of orthogonal brain networks) is up-regulated in novel learning environments and explains differences in learning rates across individuals. Thus, consistent with our findings, it is likely that functional brain networks rapidly reconfigure during early learning of novel sequential motor skills.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning (Albouy et al., 2013; Albouy et al., 2012). For example, reactivation events in the posterior parietal (Qin et al., 1997) and medial prefrontal (Euston et al., 2007; Molle & Born, 2009) cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains (Frankland & Bontempi, 2005), including motor sequence learning (Albouy et al., 2015; Buch et al., 2021; F. Jacobacci et al., 2020). Further, synchronized interactions between MPFC and hippocampus are more prominent during early as opposed to later learning stages (Albouy et al., 2013; Gais et al., 2007; Sterpenich et al., 2009), perhaps reflecting “redistribution of hippocampal memories to MPFC” (Albouy et al., 2013). MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning (Euston et al., 2012). Consistently, coupling between hippocampus and MPFC has been shown during initial memory encoding and during subsequent rest (van Kesteren et al., 2010; van Kesteren et al., 2012). Importantly, MPFC activity during initial memory encoding predicts subsequent recall (Wagner et al., 1998). Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” (Albouy et al., 2012), also engaged in the development of an abstract representation of the sequence (Ashe et al., 2006). In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012) required during early learning (Doyon et al., 2009; Hikosaka et al., 2002; Penhune & Steele, 2012). The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice (Schendan et al., 2003), all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding (Morris, 2006; Tse et al., 2007). Thus, several prefrontal and frontoparietal regions contributing to long term learning (Berlot et al., 2020) are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning. We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here.

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power (Bonstrup et al., 2019) and neural replay density (Buch et al., 2021) during inter-practice rest periods) to observed micro-offline gains.

      Reviewer #2 (Public review):

      Summary

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond.

      Strengths

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea.

      Weaknesses

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation.

      We now include a new control analysis that addresses this issue as well as additional re-examination of previously reported results with respect to this issue – all of which are inconsistent with this alternative explanation that “contextualization” reflects a change in mixing of keypress related MEG features as opposed to a change in the underlying representations themselves. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged. One must also keep in mind that since participants repeat the sequence multiple times within the same trial, a majority of the index finger keypresses are performed adjacent to one another (i.e. - the “4-4” transition marking the end of one sequence and the beginning of the next). Thus, increased overlap between consecutive index finger keypresses as typing speed increased should increase their similarity and mask contextualization related changes to the underlying neural representations.

      We addressed this question by conducting a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis also affirmed that the possible alternative explanation that contextualization effects are simple reflections of increased mixing is not supported by the data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis in the revised manuscript.

      We also re-examined our previously reported classification results with respect to this issue. We reasoned that if mixing effects reflecting the ordinal sequence structure is an important driver of the contextualization finding, these effects should be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A display a distribution of misclassifications that is inconsistent with an alternative mixing effect explanation of contextualization.

      Based upon the increased overlap between adjacent index finger keypresses (i.e. – “4-4” transition), we also reasoned that the decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position, should show decreased performance as typing speed increases. However, Figure 4C in our manuscript shows that this is not the case. The 2-class hybrid classifier actually displays improved classification performance over early practice trials despite greater temporal overlap. Again, this is inconsistent with the idea that the contextualization effect simply reflects increased mixing of individual keypress features.

      In summary, both re-examination of previously reported data and new control analyses all converged on the idea that the proximity between keypresses does not explain contextualization.

      We do agree with the Reviewer that the naturalistic, generative, self-paced task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the KeyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study.

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3 — figure supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans. This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider the specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study. We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself.

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the KeyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses. We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the KeyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder. Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the KeyDown event (t<sub>0</sub> = 0 ms). We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window. Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study. Future work in our lab, as pointed out above, are investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well.

      The Reviewer suggests that the current data is not enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last Index<sub>OP5</sub> and first Index<sub>OP1</sub> from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Figure 5 – figure supplement 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest periods.

      With respect to the second concern, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the original manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out. When quantifying online changes in contextualization from the first Index<sub>OP1</sub> the last Index<sub>OP5</sub> keypress in the same trial we observed no learning-related trend (Figure 5 – figure supplement 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Figure 5 – figure supplement 6).

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals.

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multiscale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning.

      Strengths:

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter).

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?).

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes – 1; e.g. – 3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space. We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses:

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption.

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions (Kornysheva et al., 2019). In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context.

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for).

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above. We agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above reply to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R<sup>2</sup> = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would miss most learning effects on a task in which speed is the main learning metrics.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023).

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial is pre-planned before the first keypress is performed. This occurs in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes. The Reviewer is concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. Please, note that since neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence (Kornysheva et al., 2019), mixing effects are most likely present also for the first keypress in a trial.

      Separately, the Reviewer suggests that contextualization during early learning may reflect preplanning or online planning. This is an interesting proposal. Given the decoding time-window used in this investigation, we cannot dissect separate contributions of planning, memory and sensory feedback to contextualization. Taking advantage of the superior temporal resolution of MEG relative to fMRI tools, work under way in our lab is investigating decoding time-windows more appropriate to address each of these questions.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice). It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable.

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualizaton effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that most participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user.

      The minimal participant engagement with the visual display in this explicit sequence learning motor task (which is highly generative in nature) contrasts markedly with behavior observed when reactive responses to stimulus cues are needed in the serial reaction time task (SRTT). This is a crucial difference that must be carefully considered when comparing findings across studies using the two sequence learning tasks.

      The authors report a significant correlation between "offline differentiation" and cumulative microoffline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"?

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differentiation” vs micro-online gains, (2) “online differentiation” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Figure 5 – figure supplement  4, 5 and 6). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      We disagree with this statement. The original (Bonstrup et al., 2019) paper clearly states that micro-offline gains do not necessarily reflect offline learning in some cases and must be carefully interpreted based upon the behavioral context within which they are observed. Further, the paper lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning. In fact, the excellent meta-analysis of (Pan & Rickard, 2015), which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study (Bonstrup et al., 2019), as well as in all our subsequent work. Pan & Rickard state:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943 . It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks(Brawn et al., 2010; Rickard et al., 2008 . Rickard, Cai, Rieth, Jones, and Ard (2008 and Brawn, Fenn, Nusbaum, and Margoliash (2010 (Brawn et al., 2010; Rickard et al., 2008 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008 massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard make several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They state:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead (Pan & Rickard, 2015 . One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead (Pan & Rickard, 2015 . That design appears sufficient to eliminate at least the majority of the reactive inhibition effect (Brawn et al., 2010; Rickard et al., 2008 .”

      We mindfully incorporated recommendations from (Pan & Rickard, 2015) into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects.

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.” The initial (Bonstrup et al., 2019) report was followed up by a large online crowd-sourcing study (Bonstrup et al., 2020). This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 4 below for further details on these conditions).

      Author response image 4.

      This Figure shows that micro-offline gains o ser ed in learning and nonlearning contexts are attri uted to different underl ing causes. Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from (Bonstrup et al., 2019). During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also (Bonstrup et al., 2020)). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature (Brooks et al., 2024; Gupta & Rickard, 2022; Florencia Jacobacci et al., 2020), argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning. The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds (end of Fig legend).

      Evidence documented in that paper (Bonstrup et al., 2020) showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118); 3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) (Bonstrup et al., 2020). Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve (Pan & Rickard, 2015) refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects (Buch et al., 2021). Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study (Buch et al., 2021)) linked to micro-offline gains during early skill learning. These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice (Deleglise et al., 2023). Crucial to this point, Chen et al. (2024) and Sjøgård et al (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple density during rest periods (which are known markers for neural replay (Buzsaki, 2015)) in the human hippocampus (80-120 Hz) to micro-offline gains during early skill learning.

      Thus, there is now substantial converging evidence in humans across different indirect noninvasive and direct invasive recording techniques linking hippocampal activity, neural replay dynamics and offline performance gains in skill learning.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024).

      The recent work of (Gupta & Rickard, 2022, 2024) does not present any data that directly opposes our finding that early skill learning (Bonstrup et al., 2019) is expressed as micro-offline gains during rest breaks. These studies are an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) experimental design to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.

      To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning trials (only at retest 5 min later). Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods than early learning. In fact, we reported the same findings for trials following the early learning period in our original 2019 paper (Bonstrup et al., 2019) (Author response image 4). Please, note that we also reported that cumulative microoffline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later (Bonstrup et al., 2019) (see the Results section and further elaboration in the Discussion). We interpreted these findings as indicative that the mechanisms underlying offline gains over the micro-scale of seconds during early skill learning versus over minutes or hours very likely differ.

      In the recent preprint from (Das et al., 2024), the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data. The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”. The study utilizes a spaced vs. massed practice groups between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis.

      Crucially, their design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024). A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 5):

      Author response image 5.

      This figure shows (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original (Bonstrup et al., 2019) paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) (gaps in the red shaded area) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report (Bonstrup et al., 2019) (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) (Bonstrup et al., 2019) is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range (end of figure legend).

      Participants in the original (Bonstrup et al., 2019) experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 5). Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.

      In addition, the training interventions (i.e. – the practice schedule differences between the Spaced and Massed groups) were designed in a manner that minimized any chance of effectively testing their hypothesis. First, the interventions were applied over an extremely short period relative to the length of the total training session (5% and 12% of the total training session for Massed and Spaced groups, respectively; see gaps in the red shaded area in Author response image 5). Second, the intervention was applied during a period in which only half of the known total learning occurs. Specifically, we know from Bönstrup et al. (2019) that only 46.57% of the total performance gains occur in the practice interval covered by Das et al Training 1 intervention. Thus, early skill learning as evaluated by multiple groups (Bonstrup et al., 2020; Bonstrup et al., 2019; Brooks et al., 2024; Buch et al., 2021; Deleglise et al., 2023; F. Jacobacci et al., 2020; Mylonas et al., 2024), is in the Das et al experiment amputated to about half.

      Furthermore, a substantial amount of learning takes place during Das et al’s Test 1 and Test 2 periods (32.49% of total gains combined). The fact that substantial learning is known to occur over both the Test 1 (18.06%) and Test 2 (14.43%) intervals presents a fundamental problem described by Pan and Rickard (Pan & Rickard, 2015). They reported that averaging over intervals where substantial performance gains occur (i.e. – performance is not stable) inject crucial artefacts into analyses of skill learning:

      “A large amount of averaging has the advantage of yielding more precise estimates of each subject’s pretest and posttest scores and hence more statistical power to detect a performance gain. However, calculation of gain scores using that strategy runs the risk that learning that occurs during the pretest and (or posttest periods (i.e., online learning is incorporated into the gain score (Rickard et al., 2008; Robertson et al., 2004 .”

      The above statement indicates that the Test 1 and Test 2 performance scores from Das et al. (2024) are substantially contaminated by the learning rate within these intervals. This is particularly problematic if the intervention design results in different Test 2 learning rates between the two groups. This in fact, is apparent in their data (Figure 1C,E of the Das et al., 2024 preprint) as the Test 2 learning rate for the Spaced group is negative (indicating a unique interference effect observable only for this group). Specifically, the Massed group continues to show an increase in performance during Test 2 and 4 relative to the last 10 seconds of practice during Training 1 and 2, respectively, while the Spaced group displays a marked decrease. This post-training performance decrease for the Spaced group is in stark contrast to the monotonic performance increases observed for both groups at all other time-points. One possible cause could be related to the structure of the Test intervals, which include 20 seconds of uninterrupted practice. For the Spaced group, this effectively is a switch to a Massed practice environment (i.e., two 10-secondlong practice trials merged into one long trial), which interferes with greater Training 1 interval gains observed for the Space group. Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (Figure 1E) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      In summary, the experimental design and analyses used by Das et al does not contradict the view that early skill learning is expressed as micro-offline gains during rest breaks. The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized (Bonstrup et al., 2019; Pan & Rickard, 2015). Extrapolation of this current framework to postplateau performance periods, longer timespans, or non-learning situations (e.g. – the Nonrepeating groups from Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found Figure 2B too small to be useful, as the actual elements of the cells are very hard to read.

      We have removed the grid colormap panel (top-right) from Figure 2B. All of this colormap data is actually a subset of data presented in Figure 2 – figure supplement 1, so can still be found there.

      Reviewer #2 (Recommendations for the authors):

      (1) Related to the first point in my concerns, I would suggest the authors compare decoding accuracy between correct presses followed by correct vs. incorrect presses. This would clarify if the decoder is actually taking the MEG signal for subsequent press into account. I would also suggest the authors use pre-movement MEG features and post-movement features with shorter windows and compare each result with the results for the original post-movement MEG feature with a longer window.

      The present study does not contain enough errors to perform the analysis proposed by the Reviewer. As noted above, we did re-examine our data and now report a new control regression analysis, all of which indicate that the proximity between keypresses does not explain contextualization effects.

      (2) I was several times confused by the author's use of "neural representation of an action" or "sequence action representations" in understanding whether these terms refer to representation on the level of whole-brain, region (as defined by the specific parcellation used), or voxels. In fact, what is submitted to the decoder is some complicated whole-brain MEG feature (i.e., the "neural representation"), which is a hybrid of voxel and parcel features that is further dimension-reduced and not immediately interpretable. Clarifying this point early in the text and possibly using some more sensible terms, such as adding "brain-wise" before the "sequence action representation", would be the most helpful for the readers.

      We now clarified this terminology in the revised manuscript.

      (3) Although comparing many different ways in feature selection/reduction, time window selection, and decoder types is undoubtedly a meticulous work, the current version of the manuscript seems still lacking some explanation about the details of these methodological choices, like which decoding method was actually used to report the accuracy, whether or not different decoding methods were chosen for individual participants' data, how training data was selected (is it all of the correct presses in Day 1 data?), whether the frequency power or signal amplitude was used, and so on. I would highly appreciate these additional details in the Methods section.

      The reported accuracies were based on linear discriminant analysis classifier. A comparison of different decoders (Figure 3 – figure supplement 4) shows LDA was the optimal choice.

      Whether or not different decoding methods were chosen for individual participants' data

      We selected the same decoder (LDA) performance to report the final accuracy.

      How training data was selected (is it all of the correct presses in Day 1 data?),

      Decoder training was conducted as a randomized split of the data (all correct keypresses of Day 1) into training (90%) and test (10%) samples for 8 iterations.

      Whether the frequency power or signal amplitude was used

      Signal amplitude was used for feature calculation.

      (4) In terms of the Methods, please consider adding some references about the 'F1 score', the 'feature importance score,' and the 'MRMR-based feature ranking,' as the main readers of the current paper would not be from the machine learning community. Also, why did the LDA dimensionality reduction reduce accuracy specifically for the voxel feature?

      We have now added the following statements to the Methods section that provide more detailed descriptions and references for these metrics:

      “The F1 score, defined as the harmonic mean of the precision (percentage of true predictions that are actually true positive) and recall (percentage of true positives that were correctly predicted as true) scores, was used as a comprehensive metric for all one-versus-all keypress state decoders to assess class-wise performance that accounts for both false-positive and false-negative prediction tendencies [REF]. A weighted mean F1 score was then computed across all classes to assess the overall prediction performance of the multi-class model.”

      and

      “Feature Importance Scores

      The relative contribution of source-space voxels and parcels to decoding performance (i.e. – feature importance score) was calculated using minimum redundant maximum relevance (MRMR) and highlighted in topography plots. MRMR, an approach that combines both relevance and redundancy metrics, ranked individual features based upon their significance to the target variable (i.e. – keypress state identity) prediction accuracy and their non-redundancy with other features.”

      As stated in the Reviewer responses above, the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. – 3 dimensions for 4-class keypress decoding). It is likely that the reduction in accuracy observed only for the voxel-space feature was due to the loss of relevant information during the mapping process that resulted in reduced accuracy. This reduction in accuracy for voxel-space decoding was specific to LDA. Figure 3—figure supplement 3 shows that voxel-space decoder performance actually improved when utilizing alternative dimensionality reduction techniques.

      (5) Paragraph 9, lines #139-142: "Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations."

      This does not seem to be a fair comparison, as the index finger appears twice as many as the other fingers do in the sequence. To claim this, proper statistical analysis needs to be done taking this difference into account.

      We thank the Reviewer for bringing this issue to our attention. We have now corrected this comparison to evaluate relative false negative and false positive rates between individual keypress state decoders, and have revised this statement in the manuscript as follows:

      “Notably, decoding of index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest false negative (0.116 per keypress) and false positive (0.043 per keypress) misclassification rates compared with all other digits (false negative rate range = [0.067 0.114]; false positive rate range = [0.020 0.037]; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed within different contexts (i.e. - different learning states or sequence locations).”

      (6) Finally, the authors could consider acknowledging in the Discussion that the contribution of micro-offline learning to genuine skill learning is still under debate (e.g., Gupta and Rickard, 2023; 2024; Das et al., bioRxiv, 2024).

      We have added a paragraph in the Discussion that addresses this point.

      Reviewer #3 (Recommendations for the authors):

      In addition to the additional analyses suggested in the public review, I have the following suggestions/questions:

      (1) Given that the authors introduce a new decoding approach, it would be very helpful for readers to see a distribution of window sizes and window onsets eventually used across individuals, at least for the optimized decoder.

      We have now included a new supplemental figure (Figure 4 – figure Supplement 2) that provides this information.

      (2) Please explain in detail how you arrived at the (interpolated?) group-level plot shown in Figure 1B, starting from the discrete single-trial keypress transition times. Also, please specify what the shading shows.

      Instantaneous correct sequence speed (skill measure) was quantified as the inverse of time (in seconds) required to complete a single iteration of a correctly generated full 5-item sequence. Individual keypress responses were labeled as members of correct sequences if they occurred within a 5-item response pattern matching any possible circular shifts of the 5-item sequence displayed on the monitor (41324). This approach allowed us to quantify a measure of skill within each practice trial at the resolution of individual keypresses. The dark line indicates the group mean performance dynamics for each trial. The shaded region indicates the 95% confidence limit of the mean (see Methods).

      (3) Similarly, please explain how you arrived at the group-level plot shown in Figure 1C. What are the different colored lines (rows) within each trial? How exactly did the authors reach the conclusion that KTT variability stabilizes by trial 6?

      Figure 1C provides additional information to the correct sequence speed measure above, as it also tracks individual transition speed composition over learning. Figure 1C, thus, represents both changes in overall correct sequence speed dynamics (indicated by the overall narrowing of the horizontal speed lines moving from top to bottom) and the underlying composition of the individual transition patterns within and across trials. The coloring of the lines is a shading convention used to discriminate between different keypress transitions. These curves were sampled with 1ms resolution, as in Figure 1B. Addressing the underlying keypress transition patterns requires within-subject normalization before averaging across subjects. The distribution of KTTs was normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning.

      (4) Maybe I missed it, but it was not clear to me which of the tested classifiers was eventually used. Or was that individualized as well? More generally, a comparison of the different classifiers would be helpful, similar to the comparison of dimension reduction techniques.

      We have now included a new supplemental figure that provides this information.

      (5) Please add df and effect sizes to all statistics.

      Done.

      (6) Please explain in more detail your power calculation.

      The study was powered to determine the minimum sample size needed to detect a significant change in skill performance following training using a one-sample t-test (two-sided; alpha = 0.05; 95% statistical power; Cohen’s D effect size = 0.8115 calculated from previously acquired data in our lab). The calculated minimum sample size was 22. The included study sample size (n = 27) exceeded this minimum.

      This information is now included in the revised manuscript.

      (7) The cut-off for the high-pass filter is unusually high and seems risky in terms of potential signal distortions (de Cheveigne, Neuron 2019). Why did the authors choose such a high cut-off?

      The 1Hz high-pass cut-off frequency for the 1-150Hz band-pass filter applied to the continuous raw MEG data during preprocessing has been used in multiple previous MEG publications (Barratt et al., 2018; Brookes et al., 2012; Higgins et al., 2021; Seedat et al., 2020; Vidaurre et al., 2018).

      (8) "Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not", lines 336/337 - where is that analysis?

      Additional details pertaining to this analysis are now provided in the Results section (Figure 5 – figure supplement 4).

      (9) How were feature importance scores computed?

      We have now added a new subheading in the Methods section with a more detailed description of how feature importance scores were computed.

      (10)  Please add x and y ticks plus tick labels to Figure 5 - Figure Supplement 3, panel A

      Done

      (11) Line 369, what does "comparable" mean in this context?

      The sentence in the “Study Participants” part of the Methods section referred to here has now been revised for clarity.

      (12) In lines 496/497, please specify what t=0 means (KeyDown event, I guess?).

      Yes, the KeyDown event occurs at t = 0. This has now been clarified in the revised manuscript.

      (13) Please specify consistent boundaries between alpha- and beta-bands (they are currently not consistent in the Results vs. Methods (14/15 Hz or 15/16 Hz)).

      We thank the Reviewer for alerting us to this discrepancy caused by a typographic error in the Methods. We have now corrected this so that the alpha (8-14 Hz) and beta-band (15-24 Hz) frequency limits are described consistently throughout the revised manuscript.

      References

      Albouy, G., Fogel, S., King, B. R., Laventure, S., Benali, H., Karni, A., Carrier, J., Robertson, E. M., & Doyon, J. (2015). Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage, 108, 423-434. https://doi.org/10.1016/j.neuroimage.2014.12.049

      Albouy, G., King, B. R., Maquet, P., & Doyon, J. (2013). Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus, 23(11), 985-1004. https://doi.org/10.1002/hipo.22183 Albouy, G., Sterpenich, V., Vandewalle, G., Darsaud, A., Gais, S., Rauchs, G., Desseilles, M., Boly, M., Dang-Vu, T., Balteau, E., Degueldre, C., Phillips, C., Luxen, A., & Maquet, P. (2012). Neural correlates of performance variability during motor sequence acquisition. NeuroImage, 60(1), 324-331. https://doi.org/10.1016/j.neuroimage.2011.12.049

      Andersen, R. A., & Buneo, C. A. (2002). Intentional maps in posterior parietal cortex. Annu Rev Neurosci, 25, 189-220. https://doi.org/10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      Ashe, J., Lungu, O. V., Basford, A. T., & Lu, X. (2006). Cortical control of motor sequences. Curr Opin Neurobiol, 16(2), 213-221. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=16563734

      Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W., & Donoghue, J. P. (2011). Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol, 105(4), 1603-1619. https://doi.org/10.1152/jn.00532.2010

      Barratt, E. L., Francis, S. T., Morris, P. G., & Brookes, M. J. (2018). Mapping the topological organisation of beta oscillations in motor cortex using MEG. NeuroImage, 181, 831-844. https://doi.org/10.1016/j.neuroimage.2018.06.041

      Bassett, D. S., Wymbs, N. F., Porter, M. A., Mucha, P. J., Carlson, J. M., & Grafton, S. T. (2011). Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A, 108(18), 7641-7646. https://doi.org/10.1073/pnas.1018985108

      Battaglia-Mayer, A., & Caminiti, R. (2019). Corticocortical Systems Underlying High-Order Motor Control. J Neurosci, 39(23), 4404-4421. https://doi.org/10.1523/JNEUROSCI.2094-18.2019

      Berlot, E., Popp, N. J., & Diedrichsen, J. (2020). A critical re-evaluation of fMRI signatures of motor sequence learning. Elife, 9. https://doi.org/10.7554/eLife.55241

      Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N., & Cohen, L. G. (2020). Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn, 5, 7. https://doi.org/10.1038/s41539-020-0066-9

      Bonstrup, M., Iturrate, I., Thompson, R., Cruciani, G., Censor, N., & Cohen, L. G. (2019). A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol, 29(8), 1346-1351 e1344. https://doi.org/10.1016/j.cub.2019.02.049

      Brawn, T. P., Fenn, K. M., Nusbaum, H. C., & Margoliash, D. (2010). Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci, 30(42), 13977-13982. https://doi.org/10.1523/JNEUROSCI.3295-10.2010

      Brookes, M. J., Woolrich, M. W., & Barnes, G. R. (2012). Measuring functional connectivity in MEG: a multivariate approach insensitive to linear source leakage. NeuroImage, 63(2), 910-920. https://doi.org/10.1016/j.neuroimage.2012.03.048

      Brooks, E., Wallis, S., Hendrikse, J., & Coxon, J. (2024). Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn, 9(1), 23. https://doi.org/10.1038/s41539-024-00238-6

      Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M., & Cohen, L. G. (2021). Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep, 35(10), 109193. https://doi.org/10.1016/j.celrep.2021.109193

      Buneo, C. A., & Andersen, R. A. (2006). The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia, 44(13), 2594-2606. https://doi.org/10.1016/j.neuropsychologia.2005.10.011

      Buzsaki, G. (2015). Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188. https://doi.org/10.1002/hipo.22488

      Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H., & Staresina, B. P. (2024). Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680. https://doi.org/10.1101/2024.10.06.614680

      Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Foster, J. D., Nuyujukian, P., Ryu, S. I., & Shenoy, K. V. (2012). Neural population dynamics during reaching. Nature, 487(7405), 51-56. https://doi.org/10.1038/nature11129

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol, 79(2), 1117-1123. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=9463469

      Colclough, G. L., Brookes, M. J., Smith, S. M., & Woolrich, M. W. (2015). A symmetric multivariate leakage correction for MEG connectomes. NeuroImage, 117, 439-448. https://doi.org/10.1016/j.neuroimage.2015.03.071

      Colclough, G. L., Woolrich, M. W., Tewarie, P. K., Brookes, M. J., Quinn, A. J., & Smith, S. M. (2016). How reliable are MEG resting-state connectivity metrics? NeuroImage, 138, 284-293. https://doi.org/10.1016/j.neuroimage.2016.05.070

      Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P., & Azanon, E. (2024). “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795. https://doi.org/10.1101/2024.07.11.602795

      Deleglise, A., Donnelly-Kehoe, P. A., Yeffal, A., Jacobacci, F., Jovicich, J., Amaro, E., Jr., Armony, J. L., Doyon, J., & Della-Maggiore, V. (2023). Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex, 33(10), 6120-6131. https://doi.org/10.1093/cercor/bhac489

      Doyon, J., Bellec, P., Amsel, R., Penhune, V., Monchi, O., Carrier, J., Lehéricy, S., & Benali, H. (2009). Contributions of the basal ganglia and functionally related brain structures to motor learning. [Review]. Behavioural brain research, 199(1), 61-75. https://doi.org/10.1016/j.bbr.2008.11.012

      Doyon, J., Song, A. W., Karni, A., Lalonde, F., Adams, M. M., & Ungerleider, L. G. (2002). Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A, 99(2), 1017-1022. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11805340

      Euston, D. R., Gruber, A. J., & McNaughton, B. L. (2012). The role of medial prefrontal cortex in memory and decision making. Neuron, 76(6), 1057-1070. https://doi.org/10.1016/j.neuron.2012.12.002

      Euston, D. R., Tatsuno, M., & McNaughton, B. L. (2007). Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science, 318(5853), 1147-1150. https://doi.org/10.1126/science.1148979

      Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E., & Slutzky, M. W. (2012). Local field potentials allow accurate decoding of muscle activity. J Neurophysiol, 108(1), 18-24. https://doi.org/10.1152/jn.00832.2011

      Frankland, P. W., & Bontempi, B. (2005). The organization of recent and remote memories. Nat Rev Neurosci, 6(2), 119-130. https://doi.org/10.1038/nrn1607

      Gais, S., Albouy, G., Boly, M., Dang-Vu, T. T., Darsaud, A., Desseilles, M., Rauchs, G., Schabus, M., Sterpenich, V., Vandewalle, G., Maquet, P., & Peigneux, P. (2007). Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A, 104(47), 1877818783. https://doi.org/10.1073/pnas.0705454104

      Grafton, S. T., Mazziotta, J. C., Presty, S., Friston, K. J., Frackowiak, R. S., & Phelps, M. E. (1992). Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci, 12(7), 2542-2548.

      Grover, S., Wen, W., Viswanathan, V., Gill, C. T., & Reinhart, R. M. G. (2022). Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci, 25(9), 1237-1246. https://doi.org/10.1038/s41593-022-01132-3

      Gupta, M. W., & Rickard, T. C. (2022). Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn, 7(1), 25. https://doi.org/10.1038/s41539-022-00140-z

      Gupta, M. W., & Rickard, T. C. (2024). Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep, 14(1), 4661. https://doi.org/10.1038/s41598-024-52726-9

      Hardwick, R. M., Rottschy, C., Miall, R. C., & Eickhoff, S. B. (2013). A quantitative metaanalysis and review of motor learning in the human brain. NeuroImage, 67, 283-297. https://doi.org/10.1016/j.neuroimage.2012.11.020

      Heusser, A. C., Poeppel, D., Ezzyat, Y., & Davachi, L. (2016). Episodic sequence memory is supported by a theta-gamma phase code. Nat Neurosci, 19(10), 1374-1380. https://doi.org/10.1038/nn.4374

      Higgins, C., Liu, Y., Vidaurre, D., Kurth-Nelson, Z., Dolan, R., Behrens, T., & Woolrich, M. (2021). Replay bursts in humans coincide with activation of the default mode and parietal alpha networks. Neuron, 109(5), 882-893 e887. https://doi.org/10.1016/j.neuron.2020.12.007

      Hikosaka, O., Nakamura, K., Sakai, K., & Nakahara, H. (2002). Central mechanisms of motor skill learning. Curr Opin Neurobiol, 12(2), 217-222. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=12015240

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro, E., Jr., Jovicich, J., Doyon, J., & Della-Maggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A, 117(38), 23898-23903. https://doi.org/10.1073/pnas.2009576117

      Jacobacci, F., Armony, J. L., Yeffal, A., Lerner, G., Amaro Jr, E., Jovicich, J., Doyon, J., & DellaMaggiore, V. (2020). Rapid hippocampal plasticity supports motor sequence learning.

      Proceedings of the National Academy of Sciences, 117(38), 23898-23903. Karni, A., Meyer, G., Jezzard, P., Adams, M. M., Turner, R., & Ungerleider, L. G. (1995). Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature, 377(6545), 155-158. https://doi.org/10.1038/377155a0

      Kennerley, S. W., Sakai, K., & Rushworth, M. F. (2004). Organization of action sequences and the role of the pre-SMA. J Neurophysiol, 91(2), 978-993. https://doi.org/10.1152/jn.00651.2003 00651.2003 [pii]

      Kleim, J. A., Barbay, S., & Nudo, R. J. (1998). Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol, 80, 3321-3325.

      Kornysheva, K., Bush, D., Meyer, S. S., Sadnicka, A., Barnes, G., & Burgess, N. (2019). Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron, 101(6), 1166-1180 e1163. https://doi.org/10.1016/j.neuron.2019.01.018

      Lee, S. H., Jin, S. H., & An, J. (2019). The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep, 9(1), 14066. https://doi.org/10.1038/s41598-019-50644-9

      Lisman, J. E., & Jensen, O. (2013). The theta-gamma neural code. Neuron, 77(6), 1002-1016. https://doi.org/10.1016/j.neuron.2013.03.007

      Mollazadeh, M., Aggarwal, V., Davidson, A. G., Law, A. J., Thakor, N. V., & Schieber, M. H. (2011). Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci, 31(43), 15531-15543. https://doi.org/10.1523/JNEUROSCI.2999-11.2011

      Molle, M., & Born, J. (2009). Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron, 61(4), 496-498. https://doi.org/10.1016/j.neuron.2009.02.002

      Morris, R. G. M. (2006). Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. [Review]. The European journal of neuroscience, 23(11), 2829-2846. https://doi.org/10.1111/j.1460-9568.2006.04888.x

      Mylonas, D., Schapiro, A. C., Verfaellie, M., Baxter, B., Vangel, M., Stickgold, R., & Manoach, D. S. (2024). Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci, 44(14). https://doi.org/10.1523/JNEUROSCI.1839-23.2024

      Pan, S. C., & Rickard, T. C. (2015). Sleep and motor learning: Is there room for consolidation? Psychol Bull, 141(4), 812-834. https://doi.org/10.1037/bul0000009

      Penhune, V. B., & Steele, C. J. (2012). Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res., 226(2), 579-591. https://doi.org/10.1016/j.bbr.2011.09.044

      Qin, Y. L., McNaughton, B. L., Skaggs, W. E., & Barnes, C. A. (1997). Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci, 352(1360), 1525-1533. https://doi.org/10.1098/rstb.1997.0139

      Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J., & Ard, M. C. (2008). Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn, 34(4), 834-842. https://doi.org/10.1037/0278-7393.34.4.834

      Robertson, E. M., Pascual-Leone, A., & Miall, R. C. (2004). Current concepts in procedural consolidation. Nat Rev Neurosci, 5(7), 576-582. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=15208699

      Sawamura, D., Sakuraba, S., Suzuki, Y., Asano, M., Yoshida, S., Honke, T., Kimura, M., Iwase, Y., Horimoto, Y., Yoshida, K., & Sakai, S. (2019). Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep, 9(1), 20397. https://doi.org/10.1038/s41598-019-56956-0

      Schendan, H. E., Searl, M. M., Melrose, R. J., & Stern, C. E. (2003). An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron, 37(6), 1013-1025. https://doi.org/10.1016/s0896-6273(03)00123-5

      Seedat, Z. A., Quinn, A. J., Vidaurre, D., Liuzzi, L., Gascoyne, L. E., Hunt, B. A. E., O'Neill, G. C., Pakenham, D. O., Mullinger, K. J., Morris, P. G., Woolrich, M. W., & Brookes, M. J. (2020). The role of transient spectral 'bursts' in functional connectivity: A magnetoencephalography study. NeuroImage, 209, 116537. https://doi.org/10.1016/j.neuroimage.2020.116537

      Shadmehr, R., & Holcomb, H. H. (1997). Neural correlates of motor memory consolidation. Science, 277, 821-824.

      Sjøgård, M., Baxter, B., Mylonas, D., Driscoll, B., Kwok, K., Tolosa, A., Thompson, M., Stickgold, R., Vangel, M., Chu, C., & Manoach, D. S. (2024). Hippocampal ripples mediate motor learning during brief rest breaks in humans. bioRxiv. https://doi.org/10.1101/2024.05.02.592200

      Srinivas, S., Sarvadevabhatla, R. K., Mopuri, K. R., Prabhu, N., Kruthiventi, S. S. S., & Babu, R. V. (2016). A Taxonomy of Deep Convolutional Neural Nets for Computer Vision [Technology Report]. Frontiers in Robotics and AI, 2. https://doi.org/10.3389/frobt.2015.00036

      Sterpenich, V., Albouy, G., Darsaud, A., Schmidt, C., Vandewalle, G., Dang Vu, T. T., Desseilles, M., Phillips, C., Degueldre, C., Balteau, E., Collette, F., Luxen, A., & Maquet, P. (2009). Sleep promotes the neural reorganization of remote emotional memory. J Neurosci, 29(16), 5143-5152. https://doi.org/10.1523/JNEUROSCI.0561-09.2009

      Toni, I., Ramnani, N., Josephs, O., Ashburner, J., & Passingham, R. E. (2001). Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage, 14(5), 10481057. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citati on&list_uids=11697936

      Toni, I., Thoenissen, D., & Zilles, K. (2001). Movement preparation and motor intention. NeuroImage, 14(1 Pt 2), S110-117. https://doi.org/10.1006/nimg.2001.0841

      Tse, D., Langston, R. F., Kakeyama, M., Bethus, I., Spooner, P. A., Wood, E. R., Witter, M. P., & Morris, R. G. (2007). Schemas and memory consolidation. Science, 316(5821), 76-82. https://doi.org/10.1126/science.1135935

      van Kesteren, M. T., Fernandez, G., Norris, D. G., & Hermans, E. J. (2010). Persistent schemadependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A, 107(16), 7550-7555. https://doi.org/10.1073/pnas.0914892107

      van Kesteren, M. T., Ruiter, D. J., Fernandez, G., & Henson, R. N. (2012). How schema and novelty augment memory formation. Trends Neurosci, 35(4), 211-219. https://doi.org/10.1016/j.tins.2012.02.001

      Vidaurre, D., Hunt, L. T., Quinn, A. J., Hunt, B. A. E., Brookes, M. J., Nobre, A. C., & Woolrich, M. W. (2018). Spontaneous cortical activity transiently organises into frequency specific phase-coupling networks. Nat Commun, 9(1), 2987. https://doi.org/10.1038/s41467-01805316-z

      Wagner, A. D., Schacter, D. L., Rotte, M., Koutstaal, W., Maril, A., Dale, A. M., Rosen, B. R., & Buckner, R. L. (1998). Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. [Comment]. Science (New York, N.Y.), 281(5380), 1188-1191. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&id=9712582 &retmode=ref&cmd=prlinks

      Wolpert, D. M., Goodbody, S. J., & Husain, M. (1998). Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci, 1(6), 529-533. https://doi.org/10.1038/2245

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Basha and colleagues aim to test whether the thalamic nucleus reuniens can facilitate the hippocampus/prefrontal cortex coupling during sleep. Considering the importance of sleep in memory consolidation, this study is important to understand the functional interaction between these three majorly involved regions. This work suggests that the thalamic nucleus reuniens has a functional role in synchronizing the hippocampus and prefrontal cortex.

      Strengths:

      The authors performed recordings in naturally sleeping cats, and analysed the correlation between the main slow wave sleep oscillatory hallmarks: slow waves, spindles, and hippocampal ripples, and with reuniens' neurons firing. They also associated intracellular recordings to assess the reuniens-prefrontal connectivity, and computational models of large networks in which they determined that the coupling of oscillations is modulated by the strength of hippocampal-thalamic connections.

      Thank you for your positive evaluation.

      Weaknesses:

      The authors' main claim is made on slow waves and spindle coupling, which are recorded both in the prefrontal cortex and surprisingly in reuniens. Known to be generated in the cortex by cortico-thalamic mechanisms, the slow waves and spindles recorded in reuniens show no evidence of local generation in the reuniens, which is not anatomically equipped to generate such activities. Until shown differently, these oscillations recorded in reuniens are most likely volume-conducted from nearby cortices. Therefore, such a caveat is a major obstacle to analysing their correlation (in time or frequency domains) with oscillations in other regions.

      (1) We fully agree with the reviewer that reuniens likely does not generate neither slow waves nor spindles. We do not make such claim, which we clearly stated in the discussion (lines 319-324). We propose that Reuniens neurons mediate different forms of activity. In the model, we introduced MD nucleus only because without MD we were unable to generate spindles. While the slow waves and spindles are generated in other thalamocortical regions, the REU neurons show these rhythms due to long-range projections from these regions to REU as has been shown in the model.

      (2) Definitely, we cannot exclude some influence of volume conductance on obtained LFP recordings in REU nucleus. However, we show modulation of spiking activity within REU by spindles. Spike modulation cannot be explained by volume conductance but can be explained by either synaptic drive (likely the case here) or some intrinsic neuronal processes (like T-current).

      (3) In our REU recordings for spike identification we used tetrode recordings. If slow waves and spindles are volume conducted, then slow waves and spindles recorded with tetrodes should have identical shape. Following reviewer comment, we took these recordings and subtracted one channel from another. The difference in signal during slow waves is in the order 0.1 mV. Considering that the distance between electrodes is in the order of 20 um, such a difference in voltage is major and can only be explained by local extracellular currents, likely due to synaptic activities originating in afferent structures.

      Finally, the choice of the animal model (cats) is the best suited one, as too few data, particularly anatomical ones regarding reuniens connectivity, are available to support functional results.

      (1) Thalamus of majority of mammals (definitely primates and carnivores, including cats) contain local circuit interneurons (about 30 % of all neurons). A vast majority of studies in rodents (except LGN nucleus) report either absence or extremally low (i.e. Jager P, Moore G, Calpin P, et al. Dual midbrain and forebrain origins of thalamic inhibitory interneurons. eLife. 2021; 10: e59272.) number of thalamic interneurons. Therefore, studies on other species than rodents are necessary, and bring new information, which is impossible to obtain in rodents.

      (2) Cats’ brain is much larger than the brain of mice or rats, therefore, the effects of volume conductance from cortex to REU are much smaller, if not negligible. The distance between REU and closest cortical structure (ectosylvian gyrus) in cats is about 15 mm.

      (3) Indeed, there is much less anatomical data on cats as opposed to rodents. This is why, we performed experiments shown in the figure 1. This figure contains functional anatomy data. Antidromic responses show that recorded structure projects to stimulated structure. Orthodromic responses show that stimulated structure projects to recorded structure.

      Reviewer #2 (Public Review):

      Summary:

      The interplay between the medial prefrontal cortex and ventral hippocampal system is critical for many cognitive processes, including memory and its consolidation over time. A prominent idea in recent research is that this relationship is mediated at least in part by the midline nucleus reuniens with respect to consolidation in particular. Whereas the bulk of evidence has focused on neuroanatomy and the effects of temproary or permanent lesions of the nucleus reuniens, the current work examined the electrophysiology of these three structures and how they inter-relate, especially during sleep, which is anticipated to be critical for consolidation. They provide evidence from intercellular recordings of the bi-directional functional connectivity among these structures. There is an emphasis on the interactions between these regions during sleep, especially slow-wave sleep. They provide evidence, in cats, that cortical slow waves precede reuniens slow waves and hippocampal sharp-wave ripples, which may reflect prefrontal control of the timing of thalamic and hippocampal events, They also find evidence that hippocampal sharp wave ripples trigger thalamic firing and precede the onset of reuniens and medial prefrontal cortex spindles. The authors suggest that the effectiveness of bidirectional connections between the reuniens and the (ventral) CA1 is particularly strong during non-rapid eye movement sleep in the cat. This is a very interesting, complex study on a highly topical subject.

      Strengths:

      An excellent array of different electrophysiological techniques and analyses are conducted. The temporal relationships described are novel findings that suggest mechanisms behind the interactions between the key regions of interest. These may be of value for future experimental studies to test more directly their association with memory consolidation.

      We thank this reviewer for very positive evaluation of our study.

      Weaknesses:

      Given the complexity and number of findings provided, clearer explanation(s) and organisation that directed the specific value and importance of different findings would improve the paper. Most readers may then find it easier to follow the specific relevance of key approaches and findings and their emphasis. For example, the fact that bidirectional connections exist in the model system is not new per se. How and why the specific findings add to existing literature would have more impact if this information was addressed more directly in the written text and in the figure legends.

      Thank you for this comment. In the revised version, we will do our best to simplify presentation and more clearly explain our findings.

      Reviewing Editor (Recommendations for Authors):

      Please discuss the ability of reuniens to generate spindles?

      We briefly discussed this in previous version. We now extended the discussion (p. 18).

      For population data, how many cats were used in acute and chronic experiments, where does the population data originate in Fig. 2? How repeatable were the findings across animals? Was histology verified in each animal?

      As previously stated in the beginning of method section we totally used 20 cats: 16 anesthetized (or acute) and 4 non-anesthetized (or chronic). We added number of cats in appropriate places in the result section. Population data in figure 2 comes from 48, 49 or 52 recording sessions (depending on the type of analysis, and indicated in the figure legend) from 4 chronic cats; we clarified this information in the legend. Results were highly repeatable across animals. Histology was verified in all chronic and acute animals, we added a sentence in the method section.

      Explanation of figures is very poor, values in figures should be reported in results so they can be compared in the context of the description.

      In this revised version, we report most numbers present in figures and their legend to the main text (result section).

      The depth of the recording tungsten electrodes are meaningless without the AP and ML coordinates given how heterogenous mPFC is. What is the ventromedial wall of the mPFC in the cat?

      We added the ML and AP coordinates in the method section. We corrected ventromedial wall for ventroposterior part of the mPFC.

      What are the two vertical lines in 1F?

      This was an error while preparing the figure. The panel was corrected.

      Line 90 mean +-SD of what? There are no numbers.

      Thanks, we now indicate the values.

      Panel 2L does not show increased spindling in reuniens prior to PFC as indicated in the results, please explain. It does show SWR in the hippocampus prior to spindles, what is the meaning of such a time relationship?

      Panel 2L did show an increased spindling reuniens prior to mPFC, but indeed at the time scale shown, it was not very clear. In this revised manuscript, we added an inset zooming around time zero to make this point clearer.

      Panel 2L indeed show an increase in SWR prior to the increase in spindle in both Reuniens and mPFC.

      As stated in the discussion, ‘We found that hippocampal SWRs trigger thalamic firing and precede the onset of reuniens and mPFC spindles, which points to SWRs as one of candidate events for spindle initiation.’

      It is unclear what the slow waves of PFC mean, these represent filtered PFC lfp, but is this a particular oscillation? They continue to occur during the spindle, while the slow waves supposedly trigger the spindle. Please explain and clarify.

      We recently published a review article involving several scientists studying both human and animal sleep that has inserted Box. 1 (Timofeev I, Schoch S, LeBourgeois M, Huber R, Riedner B, Kurth S. Spatio-temporal properties of sleep slow waves and implications for development. Current Opinion in Physiology. 2020; 15: 172–182). In this box among other terms, we provide current definition of slow waves vs slow oscillation. Briefly, if slow waves are repeated with a given rhythm, they typically form slow oscillation. However, if they occur in isolation or are not rhythmic, they remain slow waves, but cannot be called slow oscillation.

      Regarding relation of spindles and slow oscillation. We are currently systematically analyzing data on spindles and slow waves obtained from head-restrained and freely behaving cats. One of the main findings is that a majority of ‘cortical’ spindles are local. Local to the extent that spindles can occur in alternation in two neighboring cortical cells. Largely, LFP sleep spindles occur more or less synchronously within suprasylvian gyrus of cats where indeed a large majority of them was triggered by slow waves. The synchrony between LFP spindles in suprasylvian vs other other cortical areas is much less clear. So, it is not surprizing that spindles in one bran region can occur when there is a slow wave present in some other brain region. Something of a kind was also shown in human (Mölle M, Bergmann TO, Marshall L, Born J. Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep. 2011; 34 (10): 1411-1421).

      In this regard, we are not ready to include modifications in the manuscript.

      Line 134, where is spindle amplitude shown? Plots report power within the spindle frequency band, which obviously captures more than just spindles.

      No, plots of figure 3 B, C show the phase-amplitude coupling (PAC) strength. These were calculated with detected spindles, therefore, while we cannot exclude some false spindle detections, we are confident that the false spindle detections are at a negligible level. We modified text and instead of spindle amplitude, we describe SW-spindle amplitude coupling. This reflects our analysis with exactitude.

      The discussion must include the medio dorsal nucleus which is the largest thalamic input to the prefrontal cortex and also receives input from the hippocampus. In particular, the case must be made for why reuniens would play a more important or different role than MD? (For example: Occurrence of Hippocampal Ripples is Associated with Activity Suppression in the Mediodorsal Thalamic Nucleus - PMC (nih.gov)).

      We cited the suggested study. We cannot say whether reuniens plays a more or less important role. What is clear is that hippocampal ripples at the onset of spindles trigger increased firing in both MD and reuniens. Our extracellular recordings (Fig. 4, K) suggest that the increased firing is associated with spike-bursts. We also have a parallel unpublished study done on anesthetized mice showing SWR triggered inhibitory potentials in both reuniens and MD that reverses around -65mV - -70 mV. Because the majority of SWR occurred at the onset of cortical up state, a relative role of cortico-thalamic vs hippocampo-thalamic drive is not easy to separate. We hope, we will convincingly do this in our forthcoming study, with the limitation that it was done on anesthetized mice.

      Reviewer #1 (Recommendations For The Authors):

      I strongly encourage the authors to perform current source density analyses on the LFP signals recorded in the nucleus reuniens to make sure that the observed oscillations are indeed locally generated. So far, the anatomical organisation in reuniens cannot support the local generation of oscillations, such as spindles and slow wave. At least in rodents (the cat reuniens does not seem too different, until shown differently), there were no oscillators found in reuniens, and at least not arranged like in cortical areas, allowing the summation in time, and particularly space, of rhythmic input currents. Bipolar recordings with pairs of twisted electrodes might also be useful to assess the local existence of spindles and slow waves.

      Current source density calculation is possible when one knows the exact distance between recording sites. As we used tetrodes made with 4 twisted platinum-iridium wires, we know more or less the range of distance between recording sites, but not the exact distance between any given pair of electrodes.

      Then, the physical distance between the reuniens and any cortical structure is about 8-9 mm. Therefore, with such distances, volume conductance is expected to be negligible. If slow waves and spindles are volume conducted, then slow waves and spindles recorded with tetrodes should have identical shape. Following reviewer comment, we took these recordings and subtracted one channel from another. The difference in signal during slow waves is in the order 0.1 mV. Considering that the distance between electrodes is in the order of 20 um, such a difference in voltage is major and can only be explained by local extracellular currents, likely due to synaptic activities originating in afferent structures.

      Below, we plotted the voltage of one channel of the tetrode versus another channel of the same tetrode. If the signal was simply volume conducted, one would expect to see the vast majority of points on the x=y line (red).

      Author response image 1.

      Below is a segment of mPFC LFP recording (upper black trace), mPFC LFP filtered for spindle frequency (7-15 Hz) and the spindle detected (black lines above the filtered trace. Then two LFP traces from a tetrode in the Reuniens (orange and light blue) are overlayed. The second trace (Blue) from bottom represents the substraction of Reuniens 1 minus Reuniens 2 channel, and just below (lower Blue trace) is this susbtraction trace filtered for spindle frequency (7-15 Hz) showing clear voltage difference in the spindle range between the two electrodes. Note also that around time 179-179.5 s, there is clear spindle oscillation in the mPFC recording which is not present in the Reuniens recordings.

      Author response image 2.

      Therefore, we are convinced that in our recordings, volume conductance did not play any significant role.

      Another concern regarding delays between events, like slow waves, measured between two regions (as exemplified by Figure 3). It appears that the delays were calculated from the filtered signal. Figure 3G shows a delay between the peak of the mPFC slow wave between the raw and the filtered signal, which might be artifactual of the processing. It is though not (or less) visible for the reuniens recording. Such mismatch might explain the observed differences in delays.

      Thanks for this comment. We recomputed the analysis using the original signal (smoothed) and obtained very similar results. Panels H and I of figure 3 were updated using the new analysis performed on original signal.

      The overall analyses of LFP-triggered reuniens MUA activity lack of statistics (at least z-scored firing to normalise the firings).

      Fig. 2 H and I are representative examples for histograms; statistical data are shown in circular plots as explained in the legend. Fig. 2 L, shows populational data and we provide now standard error. Fig. 4 C and D show individual example. Fig. 4 E shows histograms of activity of all identified putative single units. Units that show significant modulation are displayed above white line. Fig. 4 F shows populational data for significantly modified units.  

      A last point of detail in the model, which surprisingly shows reuniens to excitatory hippocampal cells' connectivity. Recent literature reports that reuniens only connect hippocampal interneurons, and not principal cells (at least in rodents, I could not find any report in cats). I wonder how changing this parameter would affect the results of the computational investigation, particularly the results shown in Figure 6.

      There are several studies in the literature showing a direct excitation from the Reuniens to pyramidal cells in the CA1, here are three of them:

      Goswamee, P., et al. (2021). "Nucleus Reuniens Afferents in Hippocampus Modulate CA1 Network Function via Monosynaptic Excitation and Polysynaptic Inhibition." Frontiers in Cellular Neuroscience 15.

      Dolleman-Van der Weel MJ, Lopes da Silva FH, Witter MP (1997) Nucleus Reuniens Thalami Modulates Activity in Hippocampal Field CA1 through Excitatory and Inhibitory Mechanisms. The Journal of Neuroscience 17:5640.

      Dolleman-van der Weel MJ, Lopes da Silva FH, Witter MP (2017) Interaction of nucleus reuniens and entorhinal cortex projections in hippocampal field CA1 of the rat. Brain Structure and Function 222:2421-2438.

      Because this is not a review paper, we opted to not cite all the papers describing connectivity between mPFC, hippocampus and thalamus.

      Reviewer #2 (Recommendations For The Authors):

      I respectively suggest that the earlier (public) comments listed above should be addressed. In addition, it would be useful to make it clearer when non-rapid eye movement sleep was being addressed and when rapid eye movement was being addressed. Is it of value to use a single term instead of adding "slow wave sleep" or else clarify when either term is used? The addition of more subheadings might help. Moreover, the relative contribution/value of evidence from these two sleep states was not addressed or was not very clear.

      We tried to make it clearer when NREM and when REM was analysed.

      We replaced slow-wave sleep with NREM sleep in the figure 5 title.

      We added several subheadings in the discussion.

      Relative contribution of NREM vs REM sleep was not addressed? Sorry but we do not clearly understand your question. Figs. 2 and 3 deal mainly with NREM sleep (Fig 2.B has an example of REM sleep). Fig. 4 essentially describes results obtained during REM sleep.

      I was not sure if the Abstract summarised the key take-home messages from the large amount of evidence provided. Some choices are needed, of course, but "evidence of bidirectional connectivity" struck me as less novel than other evidence provided. Given the huge amount of findings provided, which is commendable, it is still useful to present it perhaps in a more digestible fashion. For example, the headings or the first sentence(s) below headings could indicate the aim or the outcome of the specific method/analysis/findings.

      We rewrote abstract and we also added some conclusion to highlight major findings and their meaning.

      It is more common to use NRe or Re, rather than REU.

      We avoided using RE as, for decades, we used RE to abbreviate the thalamic reticular nucleus in several publications. In this revised version, we spell at full - Reuniens.

      Line 49 mentions "short-term" memory. Please specify this more clearly as it is otherwise ambiguous. Also, line 303.

      We rephrased the sentence: In particular, the hierarchical coupling of slow waves, spindles and SWRs is thought to play a key role in memory consolidation.

      Line 303 was likely about the ventromedial wall: we corrected that sentence.

      Line 62: the word, "required" (for memory function) is too strong because there is evidence that it is not always required.

      We modified the sentence for plays a major role.

      The focus within the medial prefrontal cortex could be specified more clearly / earlier.

      The mPFC is mentioned in the second sentence of the abstract and in the first sentence of the introduction.

      Line 134: The heading states "determine" and then mentions modulation. These terms may not be interchangeable or they need clarification.

      We changed it to slow wave-spindle amplitude coupling. This represents exactly our analysis.

      Line 204: Does "cortical network" mean prefrontal cortex network"?

      Yes, as described in lines 192-193, the two cortical networks (N1 and N2) of the model represent the mPFC layer 5 and 6 respectively.

      Lines 283 to 289: These were not very clear to me.

      These lines described the potential mechanisms for the responses to hippocampal and reuniens stimulation recorded intracellularly (results in figure 1). We modified this paragraph for clarity.

      Line 296: Specify the "claim".

      We modified the sentence for “[…] provides supporting evidence for this claim that nucleus Reuniens might synchronize the activity of ventral hippocampus and mPFC.”

      The discussion naturally focuses on the thalamic nucleus reuniens, but also occasionally mentions the thalamic mediodorsal nucleus. The distinction, assuming this is highly relevant, could be expressed more clearly (direct comparison with their previous papers).

      We never published a study on the mediodorsal nucleus. We do have some unpublished results from recordings in the MD nucleus and they reveal the presence of an inhibitory component at the beginning of cortical active states, therefore behaving in a similar way to first order nuclei. It is then possible that spindles recorded in the reuniens are actually generated in the MD nucleus and then transmitted to Reuniens through the thalamic reticular nucleus, as both MD and reuniens are connected to the rostral thalamic reticular nucleus. We added some discussion about this.

      Figure 1B: Do the authors have any additional evidence of the placements in the reuniens, because the photo provided suggests a large area beyond the reuniens boundary. Also, please confirm is the CEM between Rh and Re in the cat (I think the Rh and Re are adjacent in the rat).

      Figure 1B is from an electrolytic lesion, which is necessarily bigger than the tip of the electrode. Therefore the center of the electrolytic lesion indicates where the electrode tip was located which is well within the reuniens nucleus.

      Also, yes CE (Nucleus centralis thalami, pars medialis) is located between the reuniens and rhomboid in cats. This can be found in two cat atlas:  

      Reinoso-Suárez, F. (1961). Topographischer Hirnatlas der Katze für experimental-physiologische Untersuchungen (Merck).

      Berman AL, Jones EG (1982) The Thalamus and Basal Telencephalon of the Cat: A Cytoarchitectonic Atlas with Stereotaxic Coordinates: University of Wisconsin Press.

      The first mention of hippocampus in the figure legends should remind the reader by stating "ventral hippocampus".

      In this revised version, we added “ventral” in several instances both in the main text and in figure legend.

      Figure 2: It seems unusual to mention "unusually short NREM". Presumably, things are the same otherwise - if so, perhaps mention that, especially if some of the effects reflect an "unusual" episode.

      We display this particular segment because we want to show continuous recording in which still individual elements characterizing specific states are still visible.

      Some effects look like they are strong and others perhaps weaker. If so, how do these impact the final conclusions?

      Sorry, we did not understand clearly what is meant here by the reviewer. In general, if any effect has statistically significant difference (old fashion 0.05) we consider it as significant. Any other cases are described on individual basis.

      Perhaps "MAD" should be in full on the first occasion, if not already.

      It was spelled out at line 659, but we now spell it out also in the results section and in figure 2 legend.

      Methods: the key question is the use of rodent recordings to classify cat recordings. It would be good to have a reference indicating that this can be directly used for cats, which may have different sleep cycles and patterns compared to rats.

      We did not use rodent recordings to classify cat recordings, however we did used a state detection script that was developed with rodent recordings. As mentioned in the method section, we adapted the script to cat mPFC recordings and then manual corrections were made to correctly detect REM episodes. Respectfully, our lab investigates sleep-wake in non-anesthetized animals for a few decades; we developed state detection algorithm in mice, cats, marmosets when needed (to analyse months of recordings), and we have an extensive expertise in identifying states of vigilance from electrophysiological recordings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Summary:

      The authors examine the eigenvalue spectrum of the covariance matrix of neural recordings in the whole-brain larval zebrafish during hunting and spontaneous behavior. They find that the spectrum is approximately power law, and, more importantly, exhibits scale-invariance under random subsampling of neurons. This property is not exhibited by conventional models of covariance spectra, motivating the introduction of the Euclidean random matrix model. The authors show that this tractable model captures the scale invariance they observe. They also examine the effects of subsampling based on anatomical location or functional relationships. Finally, they briefly discuss the benefit of neural codes which can be subsampled without significant loss of information.

      Strengths:

      With large-scale neural recordings becoming increasingly common, neuroscientists are faced with the question: how should we analyze them? To address that question, this paper proposes the Euclidean random matrix model, which embeds neurons randomly in an abstract feature space. This model is analytically tractable and matches two nontrivial features of the covariance matrix: approximate power law scaling, and invariance under subsampling. It thus introduces an important conceptual and technical advance for understanding large-scale simultaneously recorded neural activity.

      Weaknesses:

      The downside of using summary statistics is that they can be hard to interpret. Often the finding of scale invariance, and approximate power law behavior, points to something interesting. But here caution is in order: for instance, most critical phenomena in neural activity have been explained by relatively simple models that have very little to do with computation (Aitchison et al., PLoS CB 12:e1005110, 2016; Morrell et al., eLife 12, RP89337, 2024). Whether the same holds for the properties found here remains an open question.

      We are grateful for the thorough and constructive feedback provided on our manuscript. We have addressed each point raised by you.

      Regarding the main concern about power law behavior and scale invariance, we would like to clarify that our study does not aim to establish criticality. Instead, we focus on describing and understanding a specific scale-invariant property in terms of collapsed eigenspectra in neural activity. We tested Morrell et al.’s latent-variable model (eLife 12, RP89337, 2024, [1]), where a slowly varying latent factor drives population activity. Although it produces a seemingly power-law-like spectrum, random sampling does not replicate the strict spectral collapse observed in our data (second row in Fig. S23). This highlights that simply adding latent factors does not fully recapitulate the scale invariance we measure, suggesting richer or more intricate processes may be involved in real neural recordings.

      Specifically, we have incorporated five key revisions.

      • As mentioned, we evaluated the latent variable model proposed by Morrell et al., and found that they fail to reproduce the scale-invariant eigenspectra observed in our data; these results are now presented in the Discussion section and supported by a new Supplementary Figure (Fig. S23).

      • We included a comparison with the findings of Manley et al. (2024 [2]) regarding the issue of saturating dimension in the Discussion section, highlighting the methodological differences and their implications.

      • We added a new mathematical derivation in the Methods section, elucidating the bounded dimensionality using the spectral properties of our model. • We have added a sentence in the Discussion section to further emphasize the robustness of our findings by demonstrating their consistency across diverse datasets and experimental techniques.

      • We have incorporated a brief discussion on the implications for neural coding (lines 330-332). In particular, Fisher information can become unbounded when the slope of the power-law rank plot is less than one, as highlighted in the recent work by Moosavi et al. (bioRxiv 2024.08.23.608710, Aug, 2024 [3]).

      We believe these revisions address the concerns raised during the review process and collectively strengthen our manuscript to provides a more comprehensive and robust understanding of the geometry and dimensionality of brain-wide activity. We appreciate your consideration of our revised manuscript and look forward to your feedback.

      Recommendations for the authors:

      In particular, in our experience replies to the reviewers are getting longer than the paper, and we (and I’m sure you!) want to avoid that. Maybe just reply explicitly to the ones you disagree with? We’re pretty flexible on our end.

      (1) The main weakness, from our point of view, is whether the finding of scale invariance means something interesting, or should be expected from a null model. We can suggest such model; if it is inconsistent with the data, that would make the results far more interesting.

      Morrell et al. (eLife 12, RP89337,2024 [1]) suggest a very simple model in which the whole population is driven by a slowly time-varying quantity. It would be nice to determine whether it matched this data. If it couldn’t, that would add some evidence that there is something interesting going on.

      We appreciate your insightful suggestion to consider the model proposed by Morrell et al. (eLife 12, RP89337, 2024 [1]), where a slowly time-varying quantity drives the entire neural population. We conducted simulations using parameters from Morrell et al. [4, 1], as detailed below.

      Our simulations show that Morrell’s model can replicate a degree of scaleinvariance when using functional sampling or RG as referred to in Morrell et al, 2021, PRL [4] (FSap, Fig.S23A-D, Author response image 1). However, it fails to fully capture the scale-invariance of collapsing spectra we observed in data under random sampling (RSap, Fig.S23E-H). This discrepancy suggests that additional dynamics or structures in the neural activity are not captured by this simple model, indicating the presence of potentially novel and interesting features in the data that merit further investigation.

      Unlike random sampling, the collapse of eigenspectra under functional sampling does not require a stringent condition on the kernel function f(x) in our ERM theory (see Discussion line 269-275), potentially explaining the differing results between Fig.S23A-D and Fig.S23E-H.

      We have incorporated these findings into the Result section 2.1 (lines 100-101) and Discussion section (lines 277-282, quoted below):

      “Morrell et al. [4, 1] suggested a simple model in which a slow time-varying factor influences the entire neural population. To explore the effects of latent variables, we assessed if this model explains the scale invariance in our data. The model posits that neural activity is primarily driven by a few shared latent factors. Simulations showed that the resulting eigenspectra differed considerably from our findings (Fig. S23). Although the Morrell model demonstrated a degree of scale invariance under functional sampling, it did not align with the scale-invariant features under random sampling observed in our data, suggesting that this simple model might not capture all crucial features in our observations.”

      Author response image 1:

      Morrell’s latent model. A: We reproduce the results as presented in Morrell et al., PRL 126(11), 118302 (2021) [4]. Parameters are same as Fig. S23A. Sampled 16 to 256 neurons. Unlike in our study, the mean eigenvalues are not normalized to one. Dashed line: eigenvalues fitted to a power law. See also Morrell et al. [4] Fig.1C. Parameters are same as Author response image 1. µ is the power law exponent (black) of the fit, which is different from the µ parameter used to characterize the slow decay of the spatial correlation function, but corresponds to the parameter α in our study.

      (2) The quantification of the degree of scale invariance is done using a ”collapse index” (CI), which could be better explained/motivated. The fact that the measure is computed only for the non-leading eigenvalues makes sense but it is not clear when originally introduced. How does this measure compare to other measures of the distance between distributions?

      We thank you for raising this important point regarding the explanation and motivation for our Collapse Index (CI). We defined the Collapse Index (CI) instead of other measures of distance between distributions for two main reasons. First, the CI provides an intuitive quantification of the shift of the eigenspectrum motivated by our high-density theory for the ERM model (Eq. 3, Fig. 4A). This high-density theory is only valid for large eigenvalues excluding the leading ones, and hence we compute the CI measure with a similar restriction of the range of area integration. Second, when using distribution to assess the collapse (e.g., we can use kernel density method to estimate the distribution of eigenvalues and then calculate the KL divergence between the two distributions), it is necessary to first estimate the distributions. This estimation step introduces errors, such as inaccuracies in estimating the probability of large eigenvalues.

      We agree that a clearer explanation would enhance the manuscript and thus have made modifications accordingly. The CI is now introduced more clearly in the Results section (lines 145-148) and further detailed in the Methods section (lines 630-636). We have also revised the CI diagram in Fig. 4A to better illustrate the shift concept using a more intuitive cartoon representation.

      (3) The paper focuses on the case in which the dimensionality saturates to a finite value as the number of recorded neurons is increased. It would be useful to contrast with a case in which this does not occur. The paper would be strengthened by a comparison with Manley et al. 2024, which argued that, unlike this study, dimensionality of activity in spontaneously behaving head-fixed mice did not saturate.

      Thank you for highlighting this comparison. We have included a discussion (lines 303-309) comparing our approach with Manley et al. (2024) [2]. While Manley et al. [2] primarily used shared variance component analysis (SVCA) to estimate neural dimensionality, they observed that using PCA led to dimensionality saturation (see Figure S4D, Manley et al. [2]), consistent with our findings (Fig. 2D). We acknowledge the value of SVCA as an alternative approach and agree that it is an interesting avenue for future research. In our study, we chose to use PCA for several reasons. PCA is a well-established and widely trusted method in the neuroscience community, with a proven track record of revealing meaningful patterns in neural data. Its mathematical properties are well understood, making it particularly suitable for our theoretical analysis. While we appreciate the insights that newer methods like SVCA can provide, we believe PCA remains the most appropriate tool for addressing our specific research questions.

      (4) More importantly, we don’t understand why dimensionality saturates. For the rank plot given in Eq. 3,

      where k is rank. Using this, one can estimate sums over eigenvalues by integrals. Focusing on the N-dependence, we have

      This gives

      We don’t think you ever told us what mu/d was (see point 13 below), but in the discussion you implied that it was around 1/2 (line 249). In that case, D<sub>PR</sub> should be approximately linear in N. Could you explain why it isn’t?

      Thank you for your careful derivation. Along this line of calculations you suggested, we have now added derivations on using the ERM spectrum to estimate the upper bound of the dimension in the Methods (section 4.14.4). To deduce D<sub>PR</sub> from the spectrum, we focus on the high-density region, where an analytical expression for large eigenvalues λ is given by:

      Here, d is dimension of functional space, L is the linear size of functional space, ρ is the neuron density and γ is the coefficient in Eq. (3), which only depends on d, µ and E(σ<sup>2</sup>). The primary difference between your derivation and ours is that the eigenvalue λ<sub>r</sub> decays rapidly after the threshold r \= β(N), which significantly affects the summations and . Since we did not discuss the small eigenvalues in the article, we represent them here as an unknown function η(r,N,L).

      The sum is the trace of the covariance matrix C. As emphasized in the Methods section, without changing the properties the covariance spectrum, we always consider a normalized covariance matrix such that the mean neural activity variance E(σ<sup>2</sup>) = 1. Thus

      rather than

      The issue stems from overlooking that Eq. (3) is valid only for large eigenvalues (λ > 1).

      Using the Cauchy–Schwarz inequality, we have a upper bound of

      Conversely, provides a lower bound of :

      As a result, we must have

      In random sampling (RSap), L is fixed. We thus must have a bounded dimensionality that is independent of N for our ERM model. In functional sampling (FSap), L varies while the neuronal density ρ is fixed, leading to a different scaling relationship of the upper bound, see Methods (section 4.14.4) for further discussion.

      (5) The authors work directly with ROIs rather than attempting to separate the signals from each neuron in an ROI. It would be worth discussing whether this has a significant effect on the results.

      We appreciate your thoughtful question on the potential impact of using ROIs. The use of ROIs likely does not impact our key findings since they are validated across multiple datasets with various recording techniques and animal models, from zebrafish calcium imaging to mouse brain multi-electrode recordings (see Figure S2, S24). The consistency of the scale-invariant covariance spectrum in diverse datasets suggests that ROIs in zebrafish data do not significantly alter the conclusions, and they together enhance the generalizability of our results. We highlight this in the Discussion section (lines 319-323).

      (6) Does the Euclidean random matrix model allow the authors to infer the value of D or µ? Since the measured observables only depend on µ/D it seems that one cannot infer the latent dimension where distances between neurons are computed. Are there any experiments that one could, in principle, perform to measure D or mu? Currently the conclusion from the model and data is that D/µ is a large number so that the spectrum is independent of neuron density rho. What about the heterogeneity of the scales σ<sub>i</sub>, can this be constrained by data?

      Measuring d and µ in the ERM Model

      We agree with you that the individual values of d and µ cannot be determined separately from our analysis. In our analysis using the Euclidean Random Matrix (ERM) model, we fit the ratio µ/d, rather than the individual values of d (dimension of the functional space) or µ (exponent of the distance-dependent kernel function). This limitation is inherent because the model’s predictions for observable quantities, such as the distribution of pairwise correlation, are dependent solely on this ratio.

      Currently there are no directly targeted experiments to measure d. The dimensions of the functional space is largely a theoretical construct: it could serve to represent latent variables encoding cognitive factors that are distributed throughout the brain or specific sensory or motor feature maps within a particular brain region. It may also be viewed as the embedding space to describe functional connectivity between neurons. Thus, a direct experimental measurement of the dimensions of the functional space could be challenging. Although there are variations in the biological interpretation of the functional space, the consistent scale invariance observed across various brain regions indicates that the neuronal relationships within the functional space can be described by a uniform slowly decaying kernel function.

      Regarding the Heterogeneity of σ<sub>i</sub>

      The heterogeneity of neuronal activity variances ( σ<sub>i</sub>) is a critical factor in our analysis. Our findings indicate that this heterogeneity:

      (1) Enhances scale invariance: The covariance matrix spectrum, which incorporates the heterogeneity of , exhibits stronger scale invariance compared to the correlation matrix spectrum, which imposes for all neurons. This observation is supported by both experimental data and theoretical predictions from the ERM model, particularly in the intermediate density regime.

      (2) Can be constrained by data: We fit a log-normal distribution to the experimentally observed σ<sup>2</sup> values to capture the heterogeneity in our model which leads to excellent agreement with data (section 4.8.1). Figure S10 provides evidence for this by directly comparing the eigenspectra obtained from experimental data (Fig S10A-F) with those generated by the fitted ERM model (Fig S10M-R). These results suggest that the data provides valuable information about the distribution of neuronal activity variances.

      In conclusion, the ERM model and our analysis cannot separately determine d and µ. We also highlight that the neuronal activity variance heterogeneity, constrained by experimental data, plays a crucial role in improving the scale invariance.

      (7) Does the fitting procedure for the positions x in the latent space recover a ground truth in your statistical regime (for the number of recorded neurons)? Suppose you sampled some neurons from a Euclidean random matrix theory. Does the MDS technique the authors use recover the correct distances?

      While sampling neurons from a Euclidean random matrix model, we demonstrated numerically that the MDS technique can accurately recover the true distances, provided that the true parameter f(x) is known. To quantify the precision of recovery, we applied the CCA analysis (Section 4.9) and compared the true coordinates from the original Euclidean random matrix with the fitted coordinates obtained through our MDS procedure. The CCA correlation between the true and fitted coordinates in each spatial dimension is nearly 1 (the difference from 1 is less than 10<sup>−7</sup>). When fitting with experimental data, one source of error arises from parameter estimation. To evaluate this, we assess the estimation error of the fitted parameters. When we choose µ \= 0_.5 in our ERM model and then fit the distribution of the pairwise correlation (Eq. 21), the estimated parameter is = 0.503 ± 0._007 (standard deviation). Then, we use the MDS-recovered distances to fit the coordinates with the fitted kernel function , which is determined by the fitted parameter . The CCA correlation between the true and fitted coordinates in each direction remains nearly 1 (the difference from 1 is less than 10<sup>−5</sup>).

      (8) l. 49: ”... both the dimensionality and covariance spectrum remain invariant ...”. Just to be clear, if the spectrum is invariant, then the dimensionality automatically is too. Correct?

      Thanks for the question. In fact, there is no direct causal relationship between eigenvalue spectrum invariance and dimensionality invariance as we elaborate below and added discussions in lines 311-317. For eigenvalue spectrum invariance, we focus on the large eigenvalues, whereas dimensionality invariance considers the second order statistics of all eigenvalues. Consequently, the invariance results for these two concepts may differ. And dimensional and spectral invariance have different requirements:

      (1) The condition for dimensional saturation is finite mean square covariance

      The participation ratio D<sub>PR</sub> for random sampling (RSap) is given by Eq. 5:

      This expression becomes invariant as N → ∞ if the mean square covariance is finite. In contrast, neural dynamics models, such as the balanced excitatory-inhibitory (E-I) neural network [5], exhibit a different behavior, where , leading to unbounded dimensionality (see discussion lines 291-295, section 6.9 in SI).

      (2) The requirements for spectral invariance involving the kernel function

      In our Euclidean Random Matrix (ERM) model, the eigenvalue distribution follows:

      For spectral invariance to emerge: (1) The eigenvalue distribution must remain unchanged after sampling. (2) Since sampling reduces the neuronal density ρ. (3) The ratio µ/d must approach 0 to maintain invariance.

      We can also demonstrate that D<sub>PR</sub> is independent of density ρ in the large N limit (see the answer of question 4).

      In conclusion, there is no causal relationship between spectral invariance and dimensionality invariance. This is also the reason why we need to consider both properties separately in our analysis.

      (9) In Eq. 1, the exact expression, which includes i=j, isn’t a lot harder than the one with i=j excluded. So why i≠j?

      The choice is for illustration purposes. In Eq. 1, we wanted to demonstrate that the dimension saturates to a value independent of N. When dividing the numerator and denominator of this expression by N<sup>2</sup>, the term is independent of the neuron number N, but the term associated with the diagonal entries is of order O(1_/N_) and can be ignored for large N.

      (10) Fig. 2D: Could you explain where the theory line comes from?

      We first estimate ] from all neurons, and then compute D<sub>PR</sub> for different neuron numbers N using Eq.5 (). This is further clarified in lines 511-512.

      (11) l 94-5: ”It [scale invariance] is also absent when replacing the neural covariance matrix eigenvectors with random ones, keeping the eigenvalues identical (Fig. 2H).” If eigenvalues are identical, why does the spectrum change?

      The eigenspectra of the covariance matrices in full size are the same by construction, but the eigenspectra of the sampled covariance matrices are different because the eigenvectors affect the sampling results. Please also refer to the construction process described in section 4.3 where this is also discussed: “The composite covariance matrix with substituted eigenvectors in (Fig. 2H) was created as described in the following steps. First, we generated a random orthogonal matrix U<sub>r<.sup> (based on the Haar measure) for the new eigenvectors. This was achieved by QR decomposition A=U<sub>r</sub>R of a random matrix A with i.i.d. entries A<sub>ij</sub> ∼ N(0_,1/N_). The composite covariance matrix C<sub>r</sub> was then defined as, where Λ is a diagonal matrix that contains the eigenvalues of C. Note that since all the eigenvalues are real and U<sub>r</sub> is orthogonal, the resulting C<sub>r</sub> is a real and symmetric matrix. By construction, C<sub>r</sub> and C have the same eigenvalues, but their sampled eigenspectra can differ.”

      (12) Eq 3: There’s no dependence on the distribution of sigma. Is that correct?

      Indeed, this is true in the high-density regime when the neuron density ρ is large. The p(λ) depends only on E(σ<sup>2</sup>) rather than the distribution of σ (see Eq. 8). However, in the intermediate density regime, p(λ) depends on the distribution of σ (see Eq.9 and Eq.10). In our analysis, we consider E(σ<sup>4</sup>) as a measure of heterogeneity.

      (13) Please tell us the best fit values of µ/d.

      This information now is added in the figure caption of Fig S10: µ/d \= [0_.456,0.258,0.205,0.262,0.302,0._308] in fish 1-6.

      (14) l 133: ”The eigenspectrum is rho-independent whenever µ/d ≈ 0.”

      It looks to me like rho sets the scale but not the shape. Correct? If so, why do we care about the overall scale – isn’t it the shape that’s important?

      Yes, our study focuses on the overall scale not only the shape, because many models, such as the ERM with other kernel functions, random RNNs, Morrell’s latent model [4, 1], can exhibit a power-law spectrum. However, these models do not exhibit scale-invariance in terms of spectrum curve collapsing. Therefore, considering the overall scale reveal additional non-trivial phenomenon.

      (15) Figs. 3 and 4: Are the grey dots the same as in previous figures? Either way, please specify what they are in the figure caption.

      Yes, they are the same, and thank you for pointing it out. It has been specified in the figure caption now.

      (16) Fig. 4B: Top is correlation matrix, bottom is covariance matrix, correct? If so, that should be explicit. If not, it should be clear what the plots are.

      That is correct. Both matrices (correlation - top, covariance - bottom) are labeled in the figure caption and plot (text in the lower left corner).

      (17) l 158: ”First, the shape of the kernel function f(x) over a small distance ...”. What does ”over a small distance” mean?

      We thank you for seeking clarification on this point. We understand that the phrase ”over a small distance” could be made clearer. We made a revised explanation in lines 164-165 Here, “over a small distance” refers to modifications of the particular kernel function f(x) we use Eq. 11 near x \= 0 in the functional space, while preserving the overall power-law decay at larger distances. The t-distribution based f(x) (Eq. 11) has a natural parameter ϵ that describes the transition to near 0. So we modified f(x) in different ways, all within this interval of |x| ≤ ϵ, and considered different values of ϵ. Table S3 and Figure S7 provide a summary of these modifications. Figure S7 visually compares these modifications to the standard power-law kernel function, highlighting the differences in shape near x \= 0.

      Our findings indicate that these alterations to the kernel function at small distances do not significantly affect the distribution of large eigenvalues in the covariance spectrum. This supports our conclusion that the large eigenvalues are primarily determined by the slow decay of the kernel function at larger distances in the functional space, as this characteristic governs the overall correlations in neural activity.

      (18) l390 . This x<sub>i</sub> is, we believe, different from the x<sub>i</sub> which is position in feature space. Given the difficulty of this paper, it doesn’t help to use the same symbol to mean two different things. But maybe we’re wrong?

      Thank you for your careful reading and suggestion. Indeed here x<sub>i</sub> was representing activity rather than feature space position. We have thus revised the notation (Line 390 has been updated to line 439 as well.):

      In this revised notation: a<sub>i</sub>(t) represents the neural activity of neuron i at time t (typically the firing rate we infer from calcium imaging). is simply the mean activity of neuron i across time. Meanwhile, we’ll keep x<sub>i</sub> exclusively for denoting positions in the functional space.

      This change should make it much easier to distinguish between neural activity measurements and spatial coordinates in the functional space.

      (19) Eq. 19: is it correct that g(u) is not normalized to 1? If so, does that matter?

      It is correct that the approximation of g(u) is not normalized to 1, as Eq. 19 provides an approximation suitable only for small pairwise distances (i.e., large correlation). Therefore, we believe this does not pose an issue. We have newly added this note in lines 691-693.

      (20) I get a different answer in Eq. 20:

      Whereas in Eq. 20,

      µ

      Which is correct?

      Thank you for your careful derivation. We believe the difference arises in the calculation of g(u).In our calculations:

      ,

      (Your first equation seems to missed an 1_/µ_ in R’s exponent.)

      ,

      That is, Eq. 20 is correct. From these, we obtain

      rather than

      We hope this clarifies the question.

      (21) I’m not sure we fully understand the CCA analysis. First, our guess as to what you did: After sampling (either Asap or Fsap), you used ERM to embed the neurons in a 2-D space, and then applied canonical correlation analysis (CCA). Is that correct? If so, it would be nice if that were more clear.

      We first used ERM to embed all the neurons in a 2-D functional space, before any sampling. Once we have the embedding, we can quantify how similar the functional coordinates are with the anatomical coordinates using R<sub>CCA</sub> (section 2.4). We can then use the anatomical and functional coordinates to perform ASap and FSap, respectively. Our theory in section 2.4 predicts the effect on dimension under these samplings given the value of R<sub>CCA</sub> estimated earlier (Fig. 5D). The detailed description of the CCA analysis is in section 4.9, where we explain how CCA is used to find the axes in both anatomical and functional spaces that maximize the correlation between projections of neuron coordinates.

      As to how you sampled under Fsap, I could not figure that out – even after reading supplementary information. A clearer explanation would be very helpful.

      Thank you for your feedback. Functional sampling (FSap) entails the expansion of regions of interest (ROIs) within the functional space, as illustrated in Figure 5A, concurrently with the calculation of the covariance matrix for all neurons contained within the ROI. Technically, we implemented the sampling using the RG approach [6], which is further elaborated in Section 4.12 (lines 852-899), quoted below.

      Stage (i): Iterative Clustering We begin with N</sub>0</sub> neurons, where N</sub>0</sub> is assumed to be a power of 2. In the first iteration, we compute Pearson’s correlation coefficients for all neuron pairs. We then search greedily for the most correlated pairs and group the half pairs with the highest correlation into the first cluster; the remaining neurons form the second cluster. For each pair (a,b), we define a coarse-grained variable according to:

      ,

      Where normalizes the average to ensure unit nonzero activity. This process reduces the number of neurons to N<sub>1</sub> = N<sub>0</sub>/2. In subsequent iterations, we continue grouping the most correlated pairs of the coarse-grained neurons, iteratively reducing the number of neurons by half at each step. This process continues until the desired level of coarse-graining is achieved.

      When applying the RG approach to ERM, instead of combining neural activity, we merge correlation matrices to traverse different scales. During the _k_th iteration, we compute the coarse-grained covariance as:

      and the variance as:

      Following these calculations, we normalize the coarse-grained covariance matrix to ensure that all variances are equal to one. Note that these coarse-grained covariances are only used in stage (i) and not used to calculate the spectrum.

      Stage (ii): Eigenspectrum Calculation The calculation of eigenspectra at different scales proceeds through three sequential steps. First, for each cluster identified in Stage (i), we compute the covariance matrix using the original firing rates of neurons within that cluster (not the coarse-grained activities). Second, we calculate the eigenspectrum for each cluster. Finally, we average these eigenspectra across all clusters at a given iteration level to obtain the representative eigenspectrum for that scale.

      In stage (ii), we calculate the eigenspectra of the sub-covariance matrices across different cluster sizes as described in [6]. Let N<sub>0</sub> = 2<sup>n</sub> be the original number of neurons. To reduce it to size N \= N<sub>0</sub>/2<sup>k</sup> = 2<sup>n-k</sup>, where k is the kth reduction step, consider the coarse-grained neurons in step nk in stage (i). Each coarse-grained neuron is a cluster of 2<sup>n-k</sup> neurons. We then calculate spectrum of the block of the original covariance matrix corresponding to neurons of each cluster (there are 2<sup>k</sup> such blocks). Lastly, an average of these 2<sup>k</sup> spectra is computed.

      For example, when reducing from N<sub>0</sub> = 2<sup>3</sup> = 8 to N \= 2<sup>3−1</sup> = 4 neurons (k \= 1), we would have two clusters of 4 neurons each. We calculate the eigenspectrum for each 4x4 block of the original covariance matrix, then average these two spectra together. To better understand this process through a concrete example, consider a hypothetical scenario where a set of eight neurons, labeled 1,2,3,...,7,8, are subjected to a two-step clustering procedure. In the first step, neurons are grouped based on their maximum correlation pairs, for example, resulting in the formation of four pairs: {1,2},{3,4},{5,6}, and {7,8} (see Fig. S22). Subsequently, the neurons are further grouped into two clusters based on the results of the RG step mentioned above. Specifically, if the correlation between the coarse-grained variables of the pair {1,2} and the pair {3,4} is found to be the largest among all other pairs of coarse-grained variables, the first group consists of neurons {1,2,3,4}, while the second group contains neurons {5,6,7,8}. Next, take the size of the cluster N = 4 for example. The eigenspectra of the covariance matrices of the four neurons within each cluster are computed. This results in two eigenspectra, one for each cluster. The correlation matrices used to compute the eigenspectra of different sizes do not involve coarse-grained neurons. It is the real neurons 1,2,3,...,7,8, but with expanding cluster sizes. Finally, the average of the eigenspectra of the two clusters is calculated.

      (22) Line 37: ”even if two cell assemblies have the same D<sub>PR</sub>, they can have different shapes.” What is meant by shape here isn’t clear.

      Thank you for pointing out this potential ambiguity. The “shape” here refers to the geometric configuration of the neural activity space characterized as a highdimensional ellipsoid by the covariance. Specifically, if we denote the eigenvalues of the covariance matrix as λ<sub>1</sub>,λ<sub>2</sub>,...,λ<sub>N</sub>, then corresponds to the length of the i-th semi-axis of this ellipsoid (Figure 1B). As shown in Figure 1C, two neural populations with the same dimensionality (D<sub>PR</sub> = 25/11 ≈ 2.27) exhibit different eigenvalue spectra, leading to differently shaped ellipsoids. This clarification is now included in lines 39-40.

      (23) Please discuss if any information about the latent dimension or kernel function can be inferred from the measurements.

      Same as comment(6): we would like to clarify that in our analysis using the Euclidean Random Matrix (ERM) model, we fit the ratio µ/d, rather than the individual values of d (dimension of the functional space) or µ (exponent of the distancedependent kernel function). This limitation is inherent because the model’s predictions for observable quantities, such as the eigenvalue spectrum of the covariance matrix, are dependent solely on this ratio.

      For the kernel function, once the d is chosen, we can infer the general shape of the kernel function from data (Figs S12 and S13), up to a certain extent (see also lines 164-166). In particular, we can compare the eigenspectrum of the simulation results for different kernel functions with the eigenspectrum of our data. This allows us to qualitatively exclude certain kernel functions, such as the exponential and Gaussian kernels (Fig. S4), which show clear differences from our data.

      References

      (1) M. C. Morrell, I. Nemenman, A. Sederberg, Neural criticality from effective latent variables. eLife 12, RP89337 (2024).

      (2) J. Manley, S. Lu, K. Barber, J. Demas, H. Kim, D. Meyer, F. M. Traub, A. Vaziri, Simultaneous, cortex-wide dynamics of up to 1 million neurons reveal unbounded scaling of dimensionality with neuron number. Neuron (2024).

      (3) S. A. Moosavi, S. S. R. Hindupur, H. Shimazaki, Population coding under the scale-invariance of high-dimensional noise (2024).

      (4) M. C. Morrell, A. J. Sederberg, I. Nemenman, Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Physical Review Letters 126, 118302 (2021).

      (5) A. Renart, J. De La Rocha, P. Bartho, L. Hollender, N. Parga, A. Reyes, K. D. Harris, The asynchronous state in cortical circuits. science 327, 587–590 (2010).

      (6) L. Meshulam, J. L. Gauthier, C. D. Brody, D. W. Tank, W. Bialek, Coarse graining, fixed points, and scaling in a large population of neurons. Physical Review Letters 123, 178103 (2019).

    1. Author Response

      We thank you for the time you took to review our work and for your feedback!

      The major changes to the manuscript are:

      1. We have extended the range of locomotion velocity over which we compare its dependence with cholinergic activity in Figures 2E and S2H.

      2. We have quantified the contributions of cholinergic stimulation on multiplicative and additive gains on visual responses (Figure S7).

      3. We have provided single cell examples for the change in latency to visual response (Figure S12).

      4. We have added an analysis to compare layer 2/3 and layer 5 locomotion onset responses as a function of visuomotor condition (Figure S8).

      A detailed point-by-point response to all reviewer concerns is provided below.  

      Reviewer #1 (Public Review):

      The paper submitted by Yogesh and Keller explores the role of cholinergic input from the basal forebrain (BF) in the mouse primary visual cortex (V1). The study aims to understand the signals conveyed by BF cholinergic axons in the visual cortex, their impact on neurons in different cortical layers, and their computational significance in cortical visual processing. The authors employed two-photon calcium imaging to directly monitor cholinergic input from BF axons expressing GCaMP6 in mice running through a virtual corridor, revealing a strong correlation between BF axonal activity and locomotion. This persistent activation during locomotion suggests that BF input provides a binary locomotion state signal. To elucidate the impact of cholinergic input on cortical activity, the authors conducted optogenetic and chemogenetic manipulations, with a specific focus on L2/3 and L5 neurons. They found that cholinergic input modulates the responses of L5 neurons to visual stimuli and visuomotor mismatch, while not significantly affecting L2/3 neurons. Moreover, the study demonstrates that BF cholinergic input leads to decorrelation in the activity patterns of L2/3 and L5 neurons.

      This topic has garnered significant attention in the field, drawing the interest of many researchers actively investigating the role of BF cholinergic input in cortical activity and sensory processing. The experiments and analyses were thoughtfully designed and conducted with rigorous standards, leading to convincing results which align well with findings in previous studies. In other words, some of the main findings, such as the correlation between cholinergic input and locomotor activity and the effects of cholinergic input on V1 cortical activity, have been previously demonstrated by other labs (Goard and Dan, 2009; Pinto et al., 2013; Reimer et al., 2016). However, the study by Yogesh and Keller stands out by combining cutting-edge calcium imaging and optogenetics to provide compelling evidence of layerspecific differences in the impact of cholinergic input on neuronal responses to bottom-up (visual stimuli) and top-down inputs (visuomotor mismatch).

      We thank the reviewer for their feedback.

      Reviewer #2 (Public Review):

      The manuscript investigates the function of basal forebrain cholinergic axons in mouse primary visual cortex (V1) during locomotion using two-photon calcium imaging in head-fixed mice. Cholinergic modulation has previously been proposed to mediate the effects of locomotion on V1 responses. The manuscript concludes that the activity of basal forebrain cholinergic axons in visual cortex provides a signal which is more correlated with binary locomotion state than locomotion velocity of the animal. Cholinergic axons did not seem to respond to grating stimuli or visuomotor prediction error. Optogenetic stimulation of these axons increased the amplitude of responses to visual stimuli and decreased the response latency of layer 5 excitatory neurons, but not layer 2/3 neurons. Moreover, optogenetic or chemogenetic stimulation of cholinergic inputs reduced pairwise correlation of neuronal responses. These results provide insight into the role of cholinergic modulation to visual cortex and demonstrate that it affects different layers of visual cortex in a distinct manner. The experiments are well executed and the data appear to be of high quality. However, further analyses are required to fully support several of the study's conclusions.

      We thank the reviewer for their feedback.

      1) In experiments analysing the activity of V1 neurons, GCaMP6f was expressed using a ubiquitous Ef1a promoter, which is active in all neuronal cell types as well as potentially non-neuronal cells. The manuscript specifically refers to responses of excitatory neurons but it is unclear how excitatory neuron somata were identified and distinguished from that of inhibitory neurons or other cell types.

      This might be a misunderstanding. The Ef1α promoter has been reported to drive highly specific expression in neurons (Tsuchiya et al., 2002) with 99.7% of labeled cells in layer 2/3 of rat cortex being NeuN+ (a neuronal marker), with only 0.3% of labeled cells being GFAP+ (a glial marker) (Yaguchi et al., 2013). This bias was even stronger in layer 5 with 100% of labeled cells being NeuN+ and none GFAP+ (Yaguchi et al., 2013). The Ef1α promoter in an AAV vector, as we use it here, also biases expression to excitatory neurons. In layer 2/3 of mouse visual cortex, we have found that 96.8% ± 0.7% of labeled neurons are excitatory three weeks after viral injection (Attinger et al., 2017). Similar results have also been found in rats (Yaguchi et al., 2013), where on expressing GFP under Ef1a promoter delivered using Lenti virus, 95.2% of labeled neurons in layer 2/3 were excitatory and 94.1% in layer 5 were excitatory. These numbers are comparable to the ones obtained with promoters commonly used to target expression to excitatory neurons. To do this, typically two variants of promoters based on the transcription start region of CaMKIIα gene have been used. The first, the CaMKIIα-0.4 promoter, results in 95% excitatory specificity (Scheyltjens et al., 2015). The second, the CaMKIIα-1.3 promoter, results in only 82% excitatory specificity (Scheyltjens et al., 2015), and is thus not far from chance. We have clarified this in the manuscript. Nevertheless, we have removed the qualifier “excitatory” when talking about neurons in most instances, throughout the manuscript.

      2) The manuscript concludes that cholinergic axons convey a binary locomotion signal and are not tuned to running speed. The average running velocity of mice in this study is very slow - slower than 15 cm/s in the example trace in Figure 1D and speeds <6 cm/s were quantified in Figure 2E. However, mice can run at much faster speeds both under head-fixed and freely moving conditions (see e.g. Jordan and Keller, 2020, where example running speeds are ~35 cm/s). Given that the data in the present manuscript cover such a narrow range of running speeds, it is not possible to determine whether cholinergic axons are tuned to running speed or convey a binary locomotion signal.

      Our previous analysis window of 0-6.25 cm/s covered approximately 80% of all data. We have increased the analysis window to 0-35 cm/s that now covers more than 99% of the data (see below). Also, note that very high running speeds are probably overrepresented in the Jordan and Keller 2020 paper as mice had to be trained to run reliably before all experiments given the relatively short holding times of the intracellular recordings. The running speeds in our current dataset are comparable to other datasets we have acquired in similar experiments.

      Figure 2E has now been updated to reflect the larger range of data. Please note, as the number of mice that contribute to the data now differs as a function of velocity (some mice run faster than others), we have now switched to a variant of the plot based on hierarchical bootstrap sampling (see Methods). This does not overtly change the appearance of the plot. See Author response image 1 for a comparison of the original plot, the extended range without bootstrap sampling, and the extended range with bootstrap sampling currently used in the paper.

      Author response image 1.

      Average activity of cholinergic axons as a function of locomotion velocity. (A) As in the previous version of the manuscript. (B) As in A, but with the extended velocity range. (C) As in B, but using hierarchical bootstrap sampling to estimate median (red dots) and 95% confidence interval (shading) for each velocity bin.

      3) The analyses in Figure 4 only consider the average response to all grating orientations and directions. Without further analysing responses to individual grating directions it is unclear how stimulation of cholinergic inputs affects visual responses. Previous work (e.g. Datarlat and Stryker, 2017) has shown that locomotion can have both additive and multiplicative effects and it would be valuable to determine the type of modulation provided by cholinergic stimulation.

      We thank the reviewer for this suggestion. To address this, we quantified how cholinergic stimulation influenced the orientation tuning of V1 neurons. The stimuli we used were full field sinusoidal drifting gratings of 4 different orientations (2 directions each). For each neuron, we identified the preferred orientation and plotted responses relative to this preferred orientation as a function of whether the mouse was running, or we were stimulating cholinergic axons. Consistent with previous work, we found a mixture of a multiplicative and an additive components during running. With cholinergic axon stimulation, the multiplicative effect was stronger than the additive effect. This is now quantified in Figure S7.

      4) The difference between the effects of locomotion and optogenetic stimulation of cholinergic axons in Figure 5 may be confounded by differences in the visual stimulus. These experiments are carried out under open-loop conditions, where mice may adapt their locomotion based on the speed of the visual stimulus. Consequently, locomotion onsets are likely to occur during periods of higher visual flow. Since optogenetic stimulation is presented randomly, it is likely to occur during periods of lower visual flow speed. Consequently, the difference between the effect of locomotion and optogenetic stimulation may be explained by differences in visual flow speed and it is important to exclude this possibility.

      We find that in general locomotion is unaffected by visual flow in open loop conditions in this type of experiment (in this particular dataset, there was a small negative correlation between locomotion and visual flow in the open loop condition, Author response image 2).

      Author response image 2.

      Correlation between visual flow and locomotion in open loop conditions. Average correlation of locomotion velocity and visual flow speed in open loop for all mice in Figure 5. Each dot is an imaging site. In the open loop, the correlation between locomotion and visual flow speed is close to zero, but significantly negative in this dataset.

      However, to directly address the concern that our results are influenced by visual flow, we can restrict our analysis only to locomotion onsets that occurred in absence of visual flow (Author response image 3A and R3B). These responses are not substantially different from those when including all data (Figures 5A and 5B). Thus, the difference between the effect of locomotion and optogenetic stimulation cannot be explained by differences in visual flow speed.

      Author response image 3.

      Open loop locomotion onset responses without visual flow. (A) Average calcium response of layer 2/3 neurons in visual cortex to locomotion onset in open loop in the absence of visual flow. Shading indicates SEM. (B) As in A, but for layer 5 neurons.

      5) It is unclear why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations of L2/3 neuronal responses while optogenetic stimulation did.

      This is correct – we do not know why that is the case and can only speculate. There are at least two possible explanations for this difference:

      1) Local vs. systemic. The optogenetic manipulation is relatively local, while the chemogenetic manipulation is systemic. It is not clear how cholinergic release in other brain regions influences the correlation structure in visual cortex. It is conceivable that a cortex-wide change in cholinergic release results in a categorically different state with a specific correlation structure in layer 2/3 neurons different from the one induced by the more local optogenetic manipulation.

      2) Layer-specificity of activation. Cholinergic projections to visual cortex arrive both in superficial and deep layers. We activate the axons in visual cortex optogenetically by illuminating the cortical surface. Thus, in our optogenetic experiments, we are primarily activating the axons arriving superficially, while in the chemogenetic experiment, we are likely influencing superficial and deep axons similarly. Thus, we might expect a bias in the optogenetic activation to influencing superficial layers more strongly than the chemogenetic activation does.

      6) The effects of locomotion and optogenetic stimulation on the latency of L5 responses in Figure 7 are very large - ~100 ms. Indeed, typical latencies in mouse V1 measured using electrophysiology are themselves shorter than 100 ms (see e.g. Durand et al., 2016). Visual response latencies in stationary conditions or without optogenetic stimulation appear surprisingly long - much longer than reported in previous studies even under anaesthesia. Such large and surprising results require careful analysis to ensure they are not confounded by artefacts. However, as in Figure 4, this analysis is based only on average responses across all gratings and no individual examples are shown.

      This is correct and we speculate this is the consequence of a combination of different reasons.

      1) Calcium imaging is inherently slower than electrophysiological recordings. While measuring spiking responses using electrophysiology, response latencies of on the order of 100 ms have indeed been reported, as the reviewer points out. Using calcium imaging these latencies are typically 4 times longer (Kuznetsova et al., 2021). This is likely a combination of a) calcium signals that are slower than electrical changes, b) delays in the calcium sensor itself, and c) temporal sampling used for imaging that is about 3 orders of magnitude slower than what typically used for electrophysiology.

      2) Different neurons included in analysis. The calcium imaging likely has very different biases than electrophysiological recordings. Historically, the fraction of visually responsive neurons in visual cortex based on extracellular electrophysiological recordings has been systematically overestimated (Olshausen and Field, 2005). One key contributor to this is the fact that recordings are biased to visually responsive neurons. The criteria for inclusion of “responsive neurons” strongly influences the “average” response latency. In addition, calcium imaging has biases that relate to the vertical position of the somata in cortex. Both layer 2/3 and layer 5 recordings are likely biased to superficial layer 2/3 and superficial layer 5 neurons. Conversely, electrical recordings are likely biased to layer 4 and layer 5 neurons. Thus, comparisons at this level of resolution between data obtained with these two methods are difficult to make.

      We have added example neurons as Figure S12, as suggested.  

      Reviewer #1 (Recommendations For The Authors):

      While the study showcases valuable insights, I have a couple of concerns regarding the novelty of their research and the interpretation of results. By addressing these concerns, the authors can clarify the positioning of their research and strengthen the significance of their findings.

      (Major comments)

      1) Page 1, Line 21: The authors claim, "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." However, it is not clear which specific data or results support the claim of "switching between internal representations." Overall, their study primarily presents responses averaged across all neurons imaged, lacking a detailed exploration of individual neuron response patterns. Population analysis, such as PCA and decoding, can be used to assess the encoding of each stimulus by V1 neurons - "internal representation."<br /> To strengthen their claim regarding "switching between internal representations," the authors could consider an experiment measuring the speed at which the population activity pattern A transitions to the population activity pattern B when the visual stimulus switches from A to B. Such experiments would significantly enhance the impact of their study, providing a clearer understanding of how BF cholinergic input influences the dynamic representation of stimuli during locomotion.

      We thank the reviewer for bringing this up. That acetylcholine enables a faster switching between internal representations in layer 5 is a speculation. We have attempted to make this clearer in the discussion. Our speculation is based on the finding that the population response in layer 5 to sensory input is faster under high levels of acetylcholine (Figures 4D and 7B). In line with the reviewer’s intuition, the neuronal response to a change in visual stimulus, in our experiment from a uniform grey visual stimulus to a sinusoidal grating stimulus, is indeed faster. Based on evidence in favor of layer 5 encoding internal representation (Heindorf and Keller, 2023; Keller and Mrsic-Flogel, 2018; Suzuki and Larkum, 2020), we interpret the decrease in latency of the population response as a faster change in internal representation. We are not sure a decoding analysis would add much to this, given that a trivial decoder simply based on mean population response would already find a faster transition. We have expanded on our explanation of these points in the manuscript.

      2) Page 4, Line 103: "..., a direct measurement of the activity of cholinergic projection from basal forebrain to the visual cortex during locomotion has not been made." This statement is incorrect. An earlier study by Reimer et al. indeed imaged cholinergic axons in the visual cortex of mice running on a wheel. They found that "After walking onset, ... ACh activation, and a large pupil diameter, were sustained throughout the walking period in both cortical areas V1 and A1." Their findings are very similar to the results presented by Yogesh and Keller - that is, BF cholinergic axons exhibited locomotion statedependent activity. The authors should clarify the positioning of this study relative to previous studies.

      Reimer, J., McGinley, M., Liu, Y. et al. Pupil fluctuations track rapid changes in adrenergic and cholinergic activity in cortex. Nat Commun 7, 13289 (2016). https://doi.org/10.1038/ncomms13289

      We have clarified this as suggested. However, we disagree slightly with the reviewer here. The key question is whether the cholinergic axons imaged originate in basal forebrain. While Reimer et al. 2016 did set out to do this, we believe a number of methodological considerations prevent this conclusion:

      1) In their analysis, Reimer et al. 2016 combine data from mice with cholinergic axons labeled with either viral injection to basal forebrain or germline cross of ChAT-cre mice with reporter line. Unfortunately, it is unclear what the exact number of mice labeled with either strategy was. Based on the information in the paper, we can conclude that of the 6 mice used for experiments between 2 and 5 were germline cross. The problem with germline labeling of ChAT positive neurons is that when using a cross, VIP-ChAT+ neurons in cortex are also labeled. Based on the fact that Reimer et al. 2016 find an anticipatory increase in activity on locomotion onset, that is also seen by Larsen et al. 2018 (they use a germline cross strategy), an effect we do not see in our data, we speculate that a significant part of the signals reported in the Reimer et al. 2016 paper are from local VIP-ChAT+ neurons.

      2) In their analysis, Reimer et al. 2016 also combine all imaging data obtained from both primary auditory cortex and primary visual cortex. Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons selectively in specific cortical regions, which we do in visual cortex. Based on the information provided in their paper, we were unfortunately not able to discern the injection location for their viral labeling strategy. Given the topographic selectivity in projection from basal forebrain, this could give hints as to the relative contribution of cholinergic projections to A1 vs V1 in their data. The injection coordinates given in the methods of the Reimer paper, of 4 mm lateral and 0.5 mm posterior to bregma to target basal forebrain, are likely wrong (they fall outside the head of the mouse).

      Given the heterogeneity in the basal forebrain cholinergic neuronal population and their projection selectivity, to better understand these signals, it’s important to acquire the signals from cholinergic axons both selectively in a cortical region, as we do in visual cortex, and purely originating from basal forebrain. Collins et al. 2023 inject more laterally and thus characterize cholinergic input to S1 and A1, while Lohani et al. 2022 use GRAB sensors which complement our findings. Please note, we don’t think there is any substantial disagreement in the results of previous studies and ours, with very few exceptions, like the anticipatory increase in cholinergic activity that precedes locomotion onset in the Reimer et al. 2016 data, but not in ours. This is a rather critical point in the context of the literature of motor-related neuronal activity in mouse V1. Based on early work on the topic, it is frequently assumed that motor-related activity in V1 is driven by a cholinergic input. This is very likely incorrect given our results, hence we feel it is important to highlight this methodological caveat of earlier work.

      3) Fig. 4H: The authors found that L5 neurons exhibit positive responses at the onset of locomotion in a closed-loop configuration. Moreover, these responses are further enhanced by photostimulation of BF axons.

      In a previous study from the same authors' group (Heindorf and Keller, 2023), they reported 'negative' responses in L5a IT neurons during closed-loop locomotion. This raises a question about the potential influence of different L5 neuron types on the observed results between the two studies. Do the author think that the involvement of the other neuronal type in L5, the PT neurons, might explain the positive responses seen in the present study? Discussing this point in the paper would provide valuable insights into the underlying mechanisms.

      Yes, we do think the positive response observed on locomotion onset in closed loop is due to non-Tlx3+ neurons. Given that Tlx3-cre only labels a subset of inter-telencephalic (IT) neurons (Gerfen et al., 2013; Heindorf and Keller, 2023), it’s not clear whether the positive response is explained by the pyramidal tract (PT) neurons, or the non-Tlx3+ IT neurons. Dissecting the response profiles of different subsets of layer 5 neurons is an active area of research in the lab and we hope to be able to answer these points more comprehensively in future publications. We have expanded on this in the discussion as suggested.

      Furthermore, it would be valuable to investigate whether the effects of photostimulation of BF axons vary depending on neuronal responsiveness. This could help elucidate how neurons with positive responses, potentially putative PT neurons, differ from neurons with negative responses, putative IT neurons, in their response to BF axon photostimulation during locomotion.

      We have attempted an analysis of the form suggested. In short, we found no relationship between a neuron’s response to optogenetic stimulation of ChAT axons and its response to locomotion onset, or its mean activity. Based on their response to locomotion onset in closed loop, we split layer 5 neurons into three groups, 30% most strongly decreasing (putative Tlx3+), 30% most strongly increasing, and the rest. We did not see a response to optogenetic stimulation of basal forebrain cholinergic axons in any of the three groups (Author response image 4A). We also found no obvious relationship between the mean activity of neurons and their response to optogenetic stimulation (Author response image 4B).

      Author response image 4.

      Neither putative layer 5 cell types nor neuronal responsiveness correlates with the response to optogenetic stimulation of cholinergic axons. (A) Average calcium response of layer 5 neurons split into putative Tlx3 (closed loop locomotion onset suppressed) and non-Tlx3 like (closed loop locomotion onset activated) to optogenetic stimulation of cholinergic axons. (B) Average calcium response of layer 5 neurons to optogenetic stimulation of cholinergic axons as a function of their mean response throughout the experimental session. Left: Each dot is a neuron. Right: Average correlation in the response of layer 5 to optogenetic stimulation and mean activity over all neurons per imaging site. Each dot is an imaging site.

      (Minor comments)

      1) It is unclear which BF subregion(s) were targeted in this study.

      Thanks for pointing this out. We targeted the entire basal forebrain (medial septum, vertical and horizontal limbs of the diagonal band, and nucleus basalis) with our viral injections. All our axonal imaging data comes from visual cortex and given the sensory modality-selectivity of cholinergic projections to cortex, the labeled axons originate from medial septum and the diagonal bands (Kim et al., 2016). We have now added the labels for basal forebrain subregions targeted next to the injection coordinates in the manuscript.

      2) Page 43, Line 818: The journal name of the cited paper Collins et al. is missing.

      Fixed.

      3) In the optogenetic experiments, how long is the inter-trial interval? Simulation of BF is known to have long-lasting effects on cortical activity and plasticity. It is, therefore, important to have a sufficient interval between trials.

      The median inter-trial interval for different stimulation events are as follows:

      • Optogenetic stimulation only : 15 s

      • Optogenetic stimulation + grating : 12 s

      • Optogenetic stimulation + mismatch: 35 s

      • Optogenetic stimulation + locomotion onset: 45 s

      We have added this information to the methods in the manuscript.

      Assuming locomotion is the primary driver of acetylcholine release (as we argue in Figures 1 and 2), the frequency of stimulation roughly corresponds to the frequency of acetylcholine release experienced endogenously. It is of course possible that being awake and mobile puts the entire system in a longlasting acetylcholine driven state different from what would be observed during long-term quite wakefulness or during sleep. But the main focus of the optogenetic stimulation experiments we performed was to investigate the consequences of the rapid acetylcholine release driven by locomotion.

      4) Page 11, Line 313: "..., we cannot exclude the possibility of a systemic contribution to the effects we observe through shared projections between different cortical and subcortical target." This possibility can be tested by examining the effect of optogenetic stimulation of cholinergic axons on locomotor activity, as they did for the chemogenetic experiments (Fig. S7). If the optogenetic manipulation changes locomotor activity, it is likely that this manipulation has some impact on subcortical activity and systemic contribution to the changes in cortical responses observed.

      Based on the reviewer suggestion we tested this and found no change in the locomotor activity of the mice on optogenetic stimulation of cholinergic axons locally in visual cortex (we have added this as Figure S5 to the manuscript). Please note however, we can of course not exclude a systemic contribution based on this.

      5) Fig. 4 and 5: In a closed-loop configuration, L2/3 neurons exhibit a transient increase in response at the onset of locomotion, while in an open-loop configuration, their response is more prolonged. On the other hand, L5 neurons show a sustained response in both configurations. Do the authors have any speculation on this difference?

      This is correct. Locomotion onset responses in layer 2/3 are strongly modulated by whether the locomotion onset occurs in closed loop or open loop configurations (Widmer et al., 2022). This difference is absent in our layer 5 data here. We suspect this is a function of a differential within-layer cell type bias in the different recordings. In the layer 2/3 recordings we are likely biased strongly towards superficial L2/3 neurons that tend to be negative prediction error neurons (top-down excited and bottom-up inhibited), see e.g. (O’Toole et al., 2023). A reduction of locomotion onset responses in closed loop is what one would expect for negative prediction error neurons. While layer 5 neurons exhibit mismatch responses, they do not exhibit opposing top-down and bottom-up input that would result in such a suppression (Jordan and Keller, 2020).

      We can illustrate this by splitting all layer 2/3 neurons based on their response to gratings and to visuomotor mismatch into a positive prediction error (PE) type (top 30% positive grating response), a negative prediction error type (top 30% positive visuomotor mismatch response), and the rest (remaining neurons and neurons responsive to both grating and visuomotor mismatch). Plotting the response of these neurons to locomotion onset in closed loop and open loop, we find that negative PE neurons have a transient response to locomotion onset in closed loop while positive PE neurons have a sustained increase in response in closed loop. In open loop the response of the two populations is indistinguishable. Splitting the layer 5 neurons using the same criteria, we don’t find a striking difference between closed and open loop between the two groups of neurons. We have added this as Figure S8.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      1) As a ubiquitous promoter was used to drive GCaMP expression, please explain how excitatory neurons were identified.

      2) As the data cover a very small range of running speeds, it is important to confirm that the binary locomotion signal model still applies when mice run at higher speeds - either by selecting recordings where mice have a wider range of running speeds or conducting additional experiments. In addition, please show the running speed tuning of individual axons.

      3) Please provide a more detailed analysis of the effects of locomotion and cholinergic modulation on visual responses. How does cholinergic modulation affect orientation and direction tuning? Are the effects multiplicative or additive? How does this compare to the effects of locomotion on single neurons?

      4) To ensure that the analyses in Figure 5 are not confounded by differences in the visual stimulus, please include average visual flow speed traces for each condition.

      5) Please clarify why chemogenetic manipulations of cholinergic inputs had no effect on pairwise correlations in L2/3.

      6) The latency effect is quite an extraordinary claim and requires careful analysis. Please provide examples of single neurons illustrating the latency effect - including responses across individual grating orientations/directions. One possible confound is that grating presentation could itself trigger locomotion or other movements. In the stationary / noOpto conditions, the grating response might not be apparent in the average trace until the animal begins to move. Thus the large latency in the stationary / noOpto conditions may reflect movement-related rather than visual responses.

      Please see our responses to these points in the public review part above.

      There are some minor points where text and figures could be improved:

      1) When discussing the decorrelation of neuronal responses by cholinergic axon activation, it is important to make it clear that Figure 6D quantifies the responses of layer 5 apical dendrites rather than neurons.

      We have added this information to the results section.

      2) In Figure S7, please clarify why velocity is in arbitrary units.

      This was an oversight and has been fixed.

      3) Please clarify how locomotion and stational trials are selected in Figure 4.

      We thank the reviewers for pointing this out. Trials were classified as occurring during locomotion or while mice were stationary as follows. We used a time-window of -0.5 s to +1 s around stimulus onset. If mice exhibited uninterrupted locomotion above a threshold of 0.25 cm/s in this time-window, we considered the stimulus as occurring during locomotion, otherwise it was defined as occurring while the mice were stationary. Note, the same criteria to define locomotion state was used to isolate visuomotor mismatch events, and also during control optogenetic stimulation experiments. We have added this information to the methods.

      4) When testing whether cholinergic activation is sufficient to explain locomotion-induced decorrelation in Figure 6G-H, please show pre-CNO and post-CNO delta-correlation, not just their difference.

      We can do that, but the results are harder to parse this way. We have added this as Figure S11 to the manuscript. The problem with parsing the figure is that the pre-CNO levels are different in different groups. This is likely a function of mouse-to-mouse variability and makes it harder to identify what the CNO induced changes are. Using the pre-post difference removes the batch influence. Hence, we have left this as the main analysis in Figure 6G and 6H.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Millard and colleagues investigated if the analgesic effect of nicotine on pain sensitivity, assessed with two pain models, is mediated by Peak Alpha Frequency (PAF) recorded with resting state EEG. The authors found indeed that nicotine (4 mg, gum) reduced pain ratings during phasic heat pain but not cuff pressor algometry compared to placebo conditions. Nicotine also increased PAF (globally). However, mediation analysis revealed that the reduction in pain ratings elicited by the phasic heat pain after taking nicotine was not mediated by the changes in PAF. Also, the authors only partially replicated the correlation between PAF and pain sensitivity at baseline (before nicotine treatment). At the group-level no correlation was found, but an exploratory analysis showed that the negative correlation (lower PAF, higher pain sensitivity) was present in males but not in females. The authors discuss the lack of correlation.

      In general, the study is rigorous, methodology is sound and the paper is well-written. Results are compelling and sufficiently discussed.

      Strengths:

      Strengths of this study are the pre-registration, proper sample size calculation, and data analysis. But also the presence of the analgesic effect of nicotine and the change in PAF.

      Weaknesses:

      It would even be more convincing if they had manipulated PAF directly.

      We thank Reviewer #1 for their positive and constructive comments regarding our study. We appreciate the view that the study was rigorous and methodologically sound, that the paper was well-written, and that the strengths included our pre-registration, sample size calculation, and data analysis.

      In response to the reviewer's comment about more directly manipulating Peak Alpha Frequency (PAF), we agree that such an approach could provide a more direct investigation of the role of PAF in pain processing. We chose nicotine to modulate PAF as the literature suggested it was associated with a reliable increase in PAF speed. As mentioned in our Discussion, there are several alternative methods to manipulate PAF, such as non-invasive brain stimulation techniques (NIBS) like transcranial alternating current stimulation (tACS) or neurofeedback training. These approaches could help clarify whether a causal relationship exists between PAF and pain sensitivity. Although methods such as NIBS still require further investigation as there is little evidence for these approaches changing PAF (Millard et al., 2024).

      Reviewer #2 (Public Review):

      Summary:

      The study by Millard et al. investigates the effect of nicotine on alpha peak frequency and pain in a very elaborate experimental design. According to the statistical analysis, the authors found a factor-corrected significant effect for prolonged heat pain but not for alpha peak frequency in response to the nicotine treatment.

      Strengths:

      I very much like the study design and that the authors followed their research line by aiming to provide a complete picture of the pain-related cortical impact of alpha peak frequency. This is very important work, even in the absence of any statistical significance. I also appreciate the preregistration of the study and the well-written and balanced introduction. However, it is important to give access to the preregistration beforehand.

      Weaknesses:

      The weakness of the study revolves around three aspects:

      (1) I am not entirely convinced that the authors' analysis strategy provides a sufficient signal-tonoise ratio to estimate the peak alpha frequency in each participant reliably. A source separation (ICA or similar) would have been better suited than electrode ROIs to extract the alpha signal. By using a source separation approach, different sources of alpha (mu, occipital alpha, laterality) could be disentangled.

      (2) Also, there's a hint in the literature (reference 49 in the manuscript) that the nicotine treatment may not work as intended. Instead, the authors' decision to use nicotine to modulate the peak alpha frequency and pain relied on other, not suitable work on chronic pain and permanent smokers. In the present study, the authors use nicotine treatment and transient painful stimulation on nonsmokers.

      (3) In my view, the discussion could be more critical for some aspects and the authors speculate towards directions their findings can not provide any evidence. Speculations are indeed very important to generate new ideas but should be restricted to the context of the study (experimental pain, acute interventions). The unfortunate decision to use nicotine severely hampered the authors' aim of the study.

      Impact:

      The impact of the study could be to show what has not worked to answer the research questions of the authors. The authors claim that their approach could be used to define a biomarker of pain. This is highly desirable but requires refined methods and, in order to make the tool really applicable, more accurate approaches at subject level.

      We thank reviewer #2 for their recognition of the study’s design, the importance of this research area, and the pre-registration of our study. In response to the weaknesses highlighted:

      (1) We appreciate the reviewer’s suggestion to improve the signal-to-noise ratio by applying source separation techniques, such as ICA, which have now been performed and incorporated into the manuscript. Our original decision to use sensor-level ROIs followed the precedent set in previous studies, our rationale being to improve reproducibility and avoid  biases from picking individual electrodes or manually picking sources. We have  added analyses using an automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing sensorimotor sites. Here again we found no significant differences in the mediation results that used a sensor space sensorimotor ROI, further supporting the robustness of the chosen approach. ICA could still potentially disentangle different sources of alpha, such as occipital alpha and mu rhythm, and provide new insights into the PAF-pain relationship. We have now added a discussion in the manuscript about the potential advantages of source separation techniques and suggest that the possible contributions of separate alpha sources be investigated and compared to sensor space PAF as a direction for future research.

      (2) We recognise the reviewer's concern regarding our choice of nicotine as a modulator of pain and alpha peak frequency (PAF). The meta-analysis by Ditre et al. (2016) indeed points to small effect sizes for nicotine's impact on experimental pain and highlights the potential for publication bias. However, our decision to use nicotine in this study was not primarily based on its direct analgesic effects, but rather on its well-documented ability to modulate PAF, in smoking and non-smoker populations, as outlined in our study aims.

      In this regard, the intentional use of nicotine was to assess whether changes in PAF could mediate alterations in pain. This approach aligns with the broader concept that a direct effect of an intervention is not necessary to observe indirect effects (Fairchild & McDaniel, 2017). We have, however, revised our introduction to further clarify this rationale, highlighting that nicotine was used as a tool for PAF modulation, not solely for its potential analgesic properties.

      (3) We agree with the reviewer’s observation that certain aspects of the Discussion could be more cautious, particularly regarding speculations about nicotine’s effects and PAF as a biomarker of pain. We have revised the Discussion to ensure that our interpretations are better grounded in the data from this study, clearly stating the limitations and avoiding overgeneralization. This revision focuses on a more critical evaluation of the potential relationships between PAF, nicotine, and pain sensitivity based solely on our experimental context.

      Finally, We also apologize for not providing access to the preregistration earlier. This was an oversight on our end, and we will ensure that future preregistrations are made available upfront.

      Reviewer #3 (Public Review):

      In this manuscript, Millard et al. investigate the effects of nicotine on pain sensitivity and peak alpha frequency (PAF) in resting state EEG. To this end, they ran a pre-registered, randomized, double-blind, placebo-controlled experiment involving 62 healthy adults who received either 4 mg nicotine gum (n=29) or placebo (n=33). Prolonged heat and pressure were used as pain models. Resting state EEG and pain intensity (assessed with a visual analog scale) were measured before and after the intervention. Additionally, several covariates (sex at birth, depression and anxiety symptoms, stress, sleep quality, among others) were recorded. Data was analyzed using ANCOVAequivalent two-wave latent change score models, as well as repeated measures analysis of variance. Results do not show *experimentally relevant* changes of PAF or pain intensity scores for either of the prolonged pain models due to nicotine intake.

      The main strengths of the manuscript are its solid conceptual framework and the thorough experimental design. The researchers make a good case in the introduction and discussion for the need to further investigate the association of PAF and pain sensitivity. Furthermore, they proceed to carefully describe every aspect of the experiment in great detail, which is excellent for reproducibility purposes. Finally, they analyse the data from almost every possible angle and provide an extensive report of their results.

      The main weakness of the manuscript is the interpretation of these results. Even though some of the differences are statistically significant (e.g., global PAF, pain intensity ratings during heat pain), these differences are far from being experimentally or clinically relevant. The effect sizes observed are not sufficiently large to consider that pain sensitivity was modulated by the nicotine intake, which puts into question all the answers to the research questions posed in the study.

      We would like to express our gratitude to Reviewer #3 for their thoughtful and constructive review, including the positive feedback on the strengths of our study's conceptual framework, experimental design, and thorough methodological descriptions.

      We acknowledge the concern regarding the experimental and clinical relevance of some statistically significant results (e.g., global PAF and pain intensity during heat pain) and agree that small effect sizes may limit their practical implications. However, our primary goal was to assess whether nicotine-induced changes in PAF mediate pain changes, rather than to demonstrate large direct effects on pain sensitivity. Nicotine was chosen for its known ability to modulate PAF, and our focus was on the mechanistic role of PAF in pain perception. To clarify this, we have revised the discussion to better differentiate between statistical significance, experimental relevance, and clinical applicability. We emphasize that this study represents a preliminary step towards understanding PAF’s mechanistic role in pain, rather than a direct clinical application.

      We appreciate the suggestion to refine our interpretation. We have adjusted our language to ensure it aligns with the effect sizes observed and made recommendations for future research, such as testing different nicotine doses, to potentially uncover stronger or more clinically relevant effects.

      Although modest, we believe these findings offer valuable insights into the potential mechanisms by which nicotine affects alpha oscillations and pain. We have also discussed how these small effects could become more pronounced in different populations (e.g., chronic pain patients) and over time, offering guidance for future research on PAF modulation and pain sensitivity.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I have a number of points that the authors may want to consider for this or future work.

      (1) By reviewing the literature provided by the authors in the introduction I think that using nicotine as a means to modulate pain and alpha peak frequency was a mistake. The only work that may give a hint on whether nicotine can modulate experimental pain is the meta-analysis by Ditre and colleagues (2016). They suggest that their small effect may contain a publication bias. I think the other "large body of evidence" is testing something else than analgesia.

      Thank you for your consideration of our choice of nicotine in the study. The meta-analysis by Ditre and colleagues (2016) suggests small effect sizes for nicotine's impact on experimental pain, compared to the moderate effects claimed in some papers, especially when accounting for the potential publication bias you mentioned. However, our selection of nicotine was primarily driven by its documented ability to modulate PAF rather than its direct analgesic effects, as clearly stated in our aims. Therefore, we do not view our decision to use nicotine as a mistake; instead, it was aligned with our goal of assessing whether changes in PAF mediate alterations in pain and thus served as a valuable tool. This perspective aligns with the broader concept that a direct effect is not a prerequisite for observing indirect effects of an intervention on an outcome (Fairchild &

      McDaniel, 2017). To further enhance clarity, we've revised the introduction to emphasize the role of nicotine in manipulating PAF in relation to our study's aims.

      Previously we wrote: “A large body of evidence suggests that nicotine is an ideal choice for manipulating PAF, as both nicotine and smoking increase PAF speed [37,40–47] as well as pain thresholds and tolerance [48–52].” This has been changed to read: “Because evidence suggests that nicotine can modulate PAF, where both nicotine and smoking increase PAF speed [37,40–47], we chose nicotine to assess our aim of whether changes in PAF mediate changes in pain in a ‘mediation by design’ approach [48]. In addition, given evidence that nicotine may increase experimental pain thresholds and tolerance [49–53], nicotine could also influence pain ratings during tonic pain.”

      (2) As mentioned above, the OSF page is not accessible.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      (3) I generally struggle with the authors' approach to investigating alpha. With the approach the authors used to detect peak alpha frequency it might be that the alpha signal may just show such a low amplitude that it is impossible to reliably detect it at electrode level. In my view, the approach is not accurate enough, which can be seen by the "jagged" shape of the individual alpha peak frequency. In my view, a source separation technique would have been more useful. I wonder which of the known cortical alphas contributes to the effects the authors have reported previously: occipital, mu rhythms projections or something else? A source separation approach disentangles the different alphas and will increase the SNR. My suggestion would be to work on ICA components or similar approaches. The advantage is that the components are almost completely free of any artefacts. ICAs could be run on the entire data or separately for each individual. In the latter case, it might be that some participants do not exhibit any alpha component.

      We appreciate your thoughtful consideration of our approach to investigating alpha. The calculation of PAF involves various methods and analysis steps across the literature (Corcoran et al., 2018; Gil Avila et al., 2023; McLain et al., 2022). Your query about which known cortical alphas contribute to reported effects is important. Initially focusing on a sensorimotor component from an ICA in Furman et al., 2018, subsequent work from our labs suggested a broader relationship between PAF and pain across the scalp (Furman et al., 2019; Furman et al., 2020; Millard et al., 2022), and a desire to conduct analyses at the sensor level in order to improve the reproducibility of the methods (Furman et al., 2020). However, based on your comment we have made several additions to the manuscript, including: explaining why we did not use manual ICA methods, suggest this for future research, and added an exploratory analysis using a recently developed automated pipeline that selects components based on the presence of a peak in the alpha range and alignment with a predefined template topography representing activity from occipital or motor sites.

      While we acknowledge that ICA components can offer a better signal-to-noise ratio (SNR) and possibly smoother spectral plots, we opted for our chosen method to avoid potential bias inherent in deciding on a component following source separation. The desire for a quick, automated, replicable, and unbiased pipeline, crucial for potential clinical applications of PAF as a biomarker, influenced this decision. At the time of analysis registration, automated methods for deciding which alpha components to extract following ICA were not apparent. We have now added this reasoning to Methods.

      “Contrary to some previous studies that used ICA to isolate sensory region alpha sources (Furman et al., 2018; De Martino et al., 2021; Valentini et al., 2022), we used pre-determined sensor level ROIs to improve reproducibility and reduce the potential for bias when individually selecting ICA components. Using sensor level ROIs may decrease the signal-to-noise ratio of the data; however, this approach has still been effective for observing the relationship between PAF and experimental pain (Furman et al., 2019; Furman et al., 2020).”

      We have also added use of ICA and development of methods as a suggestion for future research in the discussion:

      “Additionally, the use of global PAF may have introduced mediation measurement error into our mediation analysis. The spatial precision used in the current study was based on previous literature on PAF as a biomarker of pain sensitivity, which have used global and/or sensorimotor ROIs (Furman et al., 2018; Furman et al., 2020). Identification and use of the exploratory electrode clusters found in this study could build upon the current work (e.g., Furman et al., 2021). However, exploratory analysis of the clusters found in the present analysis demonstrated no influence on mediation analysis results (Supplementary Materials 3.8-3.10). Alternatively, independent component analysis (ICA) could be used to identify separate sources of alpha oscillations (Choi et al., 2005), as used in other experimental PAF-pain studies (Furman et al., 2018; Valentini et al., 2022), which could aid to disentangle the potential relevance of different alpha sources in the PAFpain relationship. Although this comes with the need to develop more reproducible and automated methods for identifying such components.”

      The specific location or source of PAF that relates to pain remains unclear. Because of this, we did employ an exploratory cluster-based permutation analysis to assess the potential for variations in the presence of PAF changes across the scalp at sensor level, and emphasise that location of PAF change could be explored in future. However, we have now conducted the mediation analysis (difference score 2W-LCS model) using averages from the data-driven parietal cluster, frontal cluster, and both clusters together. For these we see a stronger effect of gum on PAF change, which was expected given the data driven approach of picking electrodes. There was still a total and direct effect of nicotine on pain during the PHP model, but still no indirect effect via change in PAF. For the CPA models, there were still no significant total, direct, or indirect effects of nicotine on CPA ratings. Therefore, using these data-driven clusters did not alter results compared to the model using the global PAF variable.

      The reader has been directed to this supplementary material so:

      “The potential mediating effect of this change in PAF on change in PHP and CPA was explored (not pre-registered) by averaging within each cluster (central-parietal: CP1, CP2, Cpz, P1, P2, P3, P4, Pz, POz; right-frontal: F8, FT8, FT10) and across both clusters. This averaging across electrodes produced three new variables, each assessed in relation to mediating effects on PHP and CPA ratings. The resulting in six exploratory mediation analysis (difference score 2W-LCS) models demonstrated minimal differences from the main analysis of global PAF (8-12 Hz), except for the

      expected stronger effect of nicotine on change in PAF (bs = 0.11-0.14, ps < .003; Supplementary

      Materials 3.8-3.10).”

      Moreover, our team has been working on an automated method for selecting ICA components, so in response to your comment we assessed whether using this method altered the results of the current analysis. The in-depth methodology behind this new automatic pipeline will be published with a validation from some co-authors in the current collaboration in due course. At present, in summary, this automatic pipeline conducts independent component analysis (ICA) 10 times for each resting state, and selects the component with the highest topographical correlation to a template created of a sensorimotor alpha component from Furman et al., (2018). 

      The results of the PHP or CPA mediation models were not substantially different using the PAF calculated from independent components than that using the global PAF. For the PHP model, the total effect (b = -0.648, p \= .033) and direct effects (b = -0.666, p \= .035) were still significant, and there was still no significant indirect effect (b = 0.018, p \= .726). The general fit was reduced, as although the CFI was above 0.90, akin to the original model, the RMSEA and SRMR were not below 0.08, unlike the original models (Little, 2013). For the CPA model, there were still no significant total (b = -0.371, p \= .357), direct (b = -0.364, p \= .386), or indirect effects (b = -0.007, p \= .906), and the model fit also decreased, with CFI below 0.90 and RMSEA and SRMR above 0.08. See supplementary material (3.11). Note that still no correlations were seen between this IC sensorimotor PAF and pain (PHP: r = 0.11, p = .4; CPA: r \= -0.064, p = .63).

      Interestingly, in both models, there was now no longer a significant a-path (PHP: b = 0.08, p =

      0.292; CPA: b = 0.039, p = 0.575), unlike previously observed (PHP: b = 0.085, p = 0.018; CPA: b = 0.089, p = 0.011). We interpret this as supporting the previously highlighted difference between finding an effect on PAF globally but not in a sensorimotor ROI (and now a sensorimotor IC), justifying the exploratory CBPA and the suggestion in the discussion to explore methodology.

      We understand that this analysis does not fully uncover the reviewer’s question in which they wondered which of the known cortical alphas contributes to the effects reported in our previous work. However, we consider this exploration to be beyond the scope of the current paper, as it would be more appropriately addressed with larger datasets or combinations of datasets, potentially incorporating MEG to better disentangle oscillatory sources. The highlighted differences seen between global PAF, sensorimotor ROI PAF, sensorimotor IC PAF, as well as the CBPA of PAF changes provide ample directions for future research to build upon: 1) which alpha (sensor or source space) are related to pain, 2) how are these alpha signals represented robustly in a replicable way, and 3) which alpha (sensor or source space) are manipulable through interventions. These are all excellent questions for future studies to investigate.

      The below text has been added to the Discussion:

      In-house code was developed to compare a sensorimotor component to the results presented in this manuscript (Supplementary Material 3.11), showing similar results to the sensorimotor ROI mediation analysis presented here. However, examination of which alpha - be it sensor or source space - are related to pain, how they can be robustly represented, and how they can be manipulated are ripe avenues for future study.

      (4) I have my doubts that you can get a reliable close to bell-shaped amplitude distribution for every participant. The argument that the peak detection procedure is hampered by the high-amplitude lower frequency can be easily solved by subtracting the "slope" before determining the peak. My issue is that the entire analysis is resting on the assumption that each participant has a reliable alpha effect at electrode level. This is not the case. Non-alpha participants can severely distort the statistics. ICA-based analyses would be more sensitive but not every participant will show alpha. You may want to argue with robust group effects but In my view, every single participant counts, particularly for this type of data analysis, where in the case of a low SNR the "peak" can easily shift to the extremes. In case there is an alpha effect for a specific subject, we should see a smooth bump in the frequency spectrum between 8 and 12 12Hz. Anything beyond that is hard to believe. The long stimulation period allows a broad FFT analysis window with a good frequency resolution in order to detect the alpha frequency bump.

      The reviewer is correct that non-alpha participants can distort the statistics. We did visually assess the EEG of each individual’s spectra at baseline to establish the presence of global peaks, as we believe this is good practice to aid understanding of the data. Please see Author response image 1 for individual spectra seen at baseline. Although not all participants had a ‘smooth bump in the frequency spectrum between 8 and 12 Hz’, we prefer to not apply/necessitate this assumption to our data. Chiang et al., (2011) suggest that ~3% of individuals do not have a discernible alpha peak, and in our data we observed only one participant without a very obvious spectral peak (px-39). But, this participant does have enough activity within the alpha range to identify PAF by the CoG method (i.e. not just flat spectra and activity on top of 1/f characteristics). Without a pre-registered and standardised decision process to remove such a participant in place, we opted to not remove any participants to avoid curation of our data.

      Author response image 1.

      (5) I find reports on frequent channel rejections reflect badly on the data quality. Bad channels can be avoided with proper EEG preparation. EEG should be continuously monitored during recording in order to obtain best data quality. Have any of the ROI channels been rejected?

      We appreciate your attention to the channel rejection. We believe that the average channels removed (0.94, 0.98, 0.74, and 0.87 [range: 0-4] for each of the four resting states out of 64 channels) does not suggest overly frequent rejection, as it was less than one electrode on average and the numbers are below the accepted number of bad channels to remove/interpolate (i.e. 10%) in EEG pipelines (Debnath et al., 2020; Kayhan et al., 2022). To maintain data quality, consistently poor channels were identified and replaced over time. We hope you will accept our transparency on this issue and note that by stating how channel removal decisions were made (i.e. 8 or more deviations) and reporting the number of channels removed, we adhere to the COBIDAS guidelines (Pernet et al., 2018; 2020).

      During analysis, cases of sensorimotor ROI channels being rejected were noted and are now specified in our manuscript. “Out of 248 resting states recorded, 14 resting states had 4 ROI channels instead of 5. Importantly, no resting state had fewer than 4 channels for the sensorimotor ROI.”

      Note, we also realised that we had not specified that we did interpolate channels for the cluster based permutation analysis. This has been corrected with the following sentence:

      “Removed channels were not interpolated for the pre-registered global and sensorimotor ROI averaged analyses, but were interpolated for an exploratory cluster based permutation analysis using the nearest neighbour average method in `Fieldtrip`.”

      (6) I have some issues buying the authors' claims that there is an effect of nicotine on prolonged pain. By looking at the mean results for the nicotine and placebo condition, this can not be right. What was the point in including the variables in the equation? In my view, in this within-subject design the effect of nicotine should be universal, no matter what gender, age, or depression. The unconditional effect of nicotine is close to zero. I can not get my head around how any of the variables can turn the effects into significance. There must be higher or lower variable scores that might be related to a higher or lower effect on nicotine. The question is not to consider these variables as a nuisance but to show how they modulate the pain-related effect of nicotine treatment. Still, the overall nicotine effect of the entire group is basically zero.

      Another point is that for within-subject analyses even tiny effects can become statistically significant if they are systematically in one direction. This might be the case here. There might be a significant effect of nicotine on pain but the actual effect size (5.73 vs. 5.78) is actually not interpretable. I think it would be interesting for the reader how (in terms of pain rating difference) each of the variables can change the effect of nicotine.

      Thank you for your comments. We recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      In light of this, we have also altered the PAF Table 3 to reflect both the pre-post values used for the CPA mediation and baseline correlations with CPA and PHP pain (i.e. N=62), and the pre-post values used for the PHP mediation (i.e. n=60).

      It is inherently difficult to visualise the findings of a mediation analysis with confounding variables that also used latent change scores (LCS) and random-effect intercepts for participants. LCS was specifically used because of issues of regression to the mean that occur if you calculate a straightforward ‘difference-score’, therefore calculating the difference in order to demonstrate the results of the statistical model in a figure, for example, does not provide a full description of the data assessed (Valente & McKinnon, 2017). Nevertheless, if we look at the data descriptively with this in mind, then calculating the change in PHP ratings does indicate that, for the nicotine group, the mean change in PHP ratings was -0.047 (SD = 1.05, range: -4.13, 1.45). Meanwhile, for the placebo group the mean change in PHP ratings was 0.33 (SD = 0.75, range: -1.37, 1.66). Therefore suggesting a slight decrease in pain ratings on average for the nicotine group compared to a slight increase on average for the placebo group. With control for pre-determined confounders, we found that the latent change score was -0.63 lower for the nicotine group compared to the control group (i.e. the direct effect of nicotine on change in pain).

      If the reviewer is only discussing the effect of nicotine on pain, we do not believe that this effect ‘should be universal’. There is clear evidence that effects of nicotine on other measures can vary greatly across individuals (Ettinger et al., 2009; Falco & Bevins, 2015; Pomerleau et al., 1995). Our intention would not be to propose a universal effect but to understand how these variables may influence nicotine's impact on pain for individuals. Here we focus on the effects of nicotine on PAF and pain sensitivity, but attempted to control for the potential influence of these other confounding factors. Therefore, our statistical approach goes beyond mean values, incorporating variables like sex at birth, age, and depression to control for and explore potential modulating factors. Control for confounding factors is an important aspect of mediation analysis (Lederer et al., 2019; VanderWeele, 2019).

      Regarding the seemingly small effect size, we understand your concern. Indeed ‘tiny effects can become statistically significant if they are systematically in one direction’, which may be what we see in this analysis. We do not agree that the effect is ‘not interpretable’, rather that it should be interpreted in light of its small effect size (effect size being the beta coefficient in our analysis, rather than the mean group difference). We agree on the importance of considering practical significance alongside statistical significance and hope to conduct additional experiments and analyses in future to elucidate the contribution of each variable to the subtle and therefore not entirely conclusive overall effect you mention.

      Your feedback on this is valuable, and we have ensured a more detailed discussion in the revised manuscript on how these factors should be interpreted alongside some additional post-hoc analyses of confounding factors that were significant in our mediation, with the note that investigation of these interactions is exploratory. We had already discussed the potential contribution of sex on the effect of nicotine on PAF, with exploratory post-hoc analysis on this included in supplementary materials. In addition, we have now added an exploratory post-hoc analysis on the potential contribution of stress on the effect of nicotine on pain. This then shows the stratified effects by the covariates that our model suggest are influencing change in PAF and pain.

      Results edits:

      “There was also a significant effect of perceived stress at baseline on change in PHP ratings when controlling for group allocation and other confounding variables (b = -0.096, p = .048, bootstrapped 95% CI: [-0.19, -0.000047]), where higher perceived stress resulted in larger decreases in PHP ratings (see Supplementary Material 3.3 for post-hoc analysis of stress).”

      Supplementary material addition:

      “3.3 Exploratory analysis of the influence of perceived stress on the effects of nicotine on change in PHP ratings “

      “Due to the significant estimated effects of perceived stress on change in PHP ratings in the 2WLCS mediation model, we also explored post-hoc effects of stress on change in PHP ratings. We found that there is strong evidence for a negative correlation between stress and change in PHP rating within the nicotine group (n = 28, r = −0.39, BF10 = 13.65; Figure 3) that is not present in the placebo group, with equivocal evidence (n = 32, r = −0.14, BF10 = 0.46). This suggests that those with higher baseline stress who had nicotine gum experienced greater decreases in PHP ratings. Note that there was less, but still sufficient evidence for this relationship within the nicotine group when the participant who was a potential outlier for change in PHP rating was removed (n = 27, r = −0.32, BF10 = 1.45). “

      Author response image 2.

      Spearman correlations od baseline perceived stress with the change in phasic heat pain (PHP) ratings, suggest strong evidence for a negative relationship for the nicotine gum groupin orange (n=28; BF<sub>10</sub>=13.65) but not for the placebo group in grey (n=32; BF<sub>10</sub>=0.46). Regression lines and 95% confidence intervals.

      Discussion edits:

      “For example, in addition to the effect of nicotine on prolonged heat pain ratings, our results suggest an effect of stress on changes in heat pain ratings, with those self-reporting higher stress at baseline having greater reductions in pain. Our post-hoc analysis suggested that this relationship between higher stress and larger decrease in PHP ratings was only present for the nicotine group (Supplementary Material 3.3). As stress is linked to nicotine use [69,70] and pain [71–73], these interactions should be explored in future.”

      (7) Is the differential effect of nicotine vs. placebo based on the pre vs. post treatment effect of the placebo condition or on the pre vs. post effect of the nicotine treatment? Can the mediation model be adapted and run for each condition separately? The placebo condition seems to have a stronger effect and may have driven the result.

      Thank you for your comments. In our mediation analysis, the differential effect of nicotine vs. placebo is assessed as a comparison between the pre-post difference within each condition. A latent change score (i.e. pre-post) is calculated for each condition (nicotine and placebo), and then the effect of being in the nicotine group (dummy coded as 1) is compared to being in the placebo group (dummy coded as 0). The comparison between conditions is needed for this model (Valente & MacKinnon, 2017), as we are assessing the change in PAF and pain in the nicotine group compared to the change in the placebo group.

      However, to address your response, it is possible to simplify and assess the relationship between the change in peak alpha frequency (PAF) and change in pain within each gum group (nicotine and placebo) independently, without including the intervention as a factor. To do this, the mediation model can be simplified to regression analysis with latent change scores that focus purely on these relationships. The results of this can help to understand whether change in PAF influences change in pain within each group separately. As with the main analysis, we see no significant influence of change in PAF on change in pain while controlling for the same confounding variables within the nicotine group (Beta = -0.146 +/- 1.105, p = 0.895, 95% CI: -2.243, 2.429) or the placebo group (Beta = 0.730 +/- 2.061, p = 0.723, 95% CI: -4.177, 3.625).

      When suggesting that the “the placebo condition seems to have a stronger effect and may have driven the result”, we believe you are referring to the increase in mean PHP ratings within the placebo group from pre (5.51 +/- 2.53) to post-placebo gum (5.84 +/- 2.67). Indeed there was a significant increase in pain ratings pre to post chewing placebo gum (t(31) = -2.53, p = 0.0165, 95% CI: -0.603, -0.0653), that was not seen after chewing nicotine gum (t(27) = 0.237, p = 0.81, 95% CI: -0.358, 0.452). In lieu of a control where no gum was chewed (i.e. simply a second pain assessment ~30 minutes after the first), we assume the gum without nicotine is a good reference that controls for the effect of time plus expectation of chewing nicotine gum. With this in mind, as we describe in our results, the change in PHP ratings is reduced in the nicotine group compared to the placebo group. Note that this phrasing keeps the effect of placebo on pain as our reference from which to view the effect of nicotine on pain. However, you are correct that we need to ensure we emphasise that the change in pain in the PHP group is reduced in comparison to the change seen after placebo.

      We have not included these extra statistics in our revised manuscript, but hope that they aid the your understanding and interpretation of the included analyses and have highlighted these nuances in the discussion.

      “However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice.”

      (8) I would not dare to state that nicotine can function as an acute analgesic. Acute analgesics need to work for everyone. The average effect here is close to zero.

      In light of your feedback, we have refined our language to avoid a sweeping assertion of universal analgesic effects and emphasize individual variability. Nicotine's role as a coping strategy for pain is acknowledged in the literature (Robinson et al., 2022), with the meta-analysis by Ditre et al. (2016) discussing its potential as an acute analgesic in humans, along with some evidence from animal research (Zhang et al., 2020). Our revised discussion underscores the need for further exploration into factors influencing nicotine's potential impact on pain. We have also specified the short-term nature of nicotine use in this context to distinguish acute effects from potential opposing effects after long-term use (Zhang et al., 2020).

      “Short-term nicotine use is thought to have acute analgesic properties in experimental settings, with a review reporting that nicotine increased pain thresholds and pain tolerance [49]. In addition, research in a rat model suggests analgesic effects on mechanical thresholds after short-term nicotine use (Zhang et al., 2020). However, previous research has not assessed the acute effects of nicotine on prolonged experimental pain models. The present study found that 4 mg of nicotine reduced heat pain ratings during prolonged heat pain compared to placebo for our human participants, but that prolonged pressure pain decreased irrespective of which gum was chewed. Our findings are thus partly consistent with the idea that nicotine may have acute analgesic properties [49], although further research is required to explore factors that may influence nicotine’s potential impact on a variety of prolonged pain models. We further advance the literature by reporting this effect in a

      model of prolonged heat pain, which better approximates the experience of clinical pain than short lasting models used to assess thresholds and tolerance [50]. However, we note that the observed effect of nicotine on pain was small in magnitude, and most prominent in comparison to the effect of placebo, where pain ratings increased after chewing, which brings into question whether this reduction in pain is meaningful in practice. Future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (9) Figures 2E and 2F are not particularly intuitive. Usually, the colour green in "jet" colour coding is being used for "zero" values. I would suggest to cut off the blue and use only the range between red green and red.

      We have chosen to retain the current colour scale for several reasons. In our analysis, green represents the middle of the frequency range (approx 10 Hz in this case), and if we were to use green as zero, it would effectively remove both blue and green from the plot, resulting in only red shades. Additionally, we have provided a clear colour scale for reference next to the plot, which allows readers to interpret the data accurately. Our intention is to maintain clarity and precision in representing the data, rather than conforming strictly to conventional practices in color coding.

      We believe that the current representation effectively conveys the results of our study while allowing readers to interpret the data within the context provided. Thank you again for your suggestion, and we hope you understand our reasoning in this matter.

      (10) Did the authors do their analysis on the parietal ROI or on the pre-registerred ROI?

      The analysis was conducted on the pre-registered sensorimotor ROI and on the global values. We have now also conducted the analysis with the regions suggested with the cluster based permutation analysis as requested by reviewer 2, comment 3.

      (11) Point 3.2 in the discussion. I would be very cautious to discuss smoking and chronic pain in the context of the manuscript. The authors can not provide any additional knowledge with their design targeting non-smokers, acute nicotine and experimental pain. The information might be interesting in the introduction in order to provide the reader with some context but is probably misleading in the discussion.

      We appreciate your perspective and agree with your caution regarding the discussion of smoking and chronic pain. While our study specifically targets non-smokers and focuses on acute nicotine effects in experimental pain, we understand the importance of contextual clarity. We have removed these points from the discussion to not mislead the reader.

      Previously we wrote, and have removed: “For those with chronic pain, smoking and nicotine use is reported as a coping strategy for pain [52]; abstinence can increase pain sensitivity [48,50], and pain is thus seen as a barrier to smoking cessation due to fear of worsening pain [51,52]. Therefore, continued understanding of the acute effects of nicotine on models of prolonged pain could improve understanding of the role of nicotine and smoking use in chronic pain [49,51,52].”

      (12) I very much appreciate section 3.3 of the discussion. I would not give up on PAF as a target to modulate pain. A modulation might not be possible in such a short period of experimental intervention. PAF might need longer and different interventions to gradually shift in order to attenuate the intensity of pain. As discussed by the authors themselves, I would also consider other targets for alpha analysis (as mentioned above not other electrodes or ROIs but separated sources.)

      Thank you for your comments on section 3.3. We appreciate your recognition of the potential significance of PAF as a target for pain modulation. Your insights align with our considerations that the experimental intervention duration or type might be a limiting factor in observing substantial shifts in PAF to attenuate pain intensity. We had mentioned the use of the exploratory electrode clusters in future work, but have now also mentioned that the use of ICA to identify separate ICA sources may provide an alternative approach. See responses to your previous ICA comment regarding separate sources.

      REFERENCES for responses to reviewer 2

      Chiang, A. K. I., Rennie, C. J., Robinson, P. A., Van Albada, S. J., & Kerr, C. C. (2011). Age trends and sex differences of alpha rhythms including split alpha peaks. Clinical Neurophysiology, 122(8), 1505-1517.

      Debnath, R., Buzzell, G. A., Morales, S., Bowers, M. E., Leach, S. C., & Fox, N. A. (2020). The Maryland analysis of developmental EEG (MADE) pipeline. Psychophysiology, 57(6), e13580.

      Ettinger, U., Williams, S. C., Patel, D., Michel, T. M., Nwaigwe, A., Caceres, A., ... & Kumari, V. (2009). Effects of acute nicotine on brain function in healthy smokers and non-smokers: estimation of inter-individual response heterogeneity. Neuroimage, 45(2), 549-561.

      Falco, A. M., & Bevins, R. A. (2015). Individual differences in the behavioral effects of nicotine: a review of the preclinical animal literature. Pharmacology Biochemistry and Behavior, 138, 80-90.

      Kayhan, E., Matthes, D., Haresign, I. M., Bánki, A., Michel, C., Langeloh, M., ... & Hoehl, S. (2022). DEEP: A dual EEG pipeline for developmental hyperscanning studies. Developmental cognitive neuroscience, 54, 101104.

      Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., ... & Vincent, J. L. (2019). Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. Annals of the American Thoracic Society, 16(1), 22-28.

      Little TD. Longitudinal structural equation modeling. Guilford press; 2013.

      Pernet, C., Garrido, M., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2018). Best practices in data analysis and sharing in neuroimaging using MEEG.

      Pernet, C., Garrido, M. I., Gramfort, A., Maurits, N., Michel, C. M., Pang, E., ... & Puce, A. (2020). Issues and recommendations from the OHBM COBIDAS MEEG committee for reproducible EEG and MEG research. Nature neuroscience, 23(12), 1473-1483.

      Pomerleau, O. F. (1995). Individual differences in sensitivity to nicotine: implications for genetic research on nicotine dependence. Behavior genetics, 25(2), 161-177.

      Robinson, C. L., Kim, R. S., Li, M., Ruan, Q. Z., Surapaneni, S., Jones, M., ... & Southerland, W. (2022). The Impact of Smoking on the Development and Severity of Chronic Pain. Current Pain and Headache Reports, 26(8), 575-581.

      Xia, J., Mazaheri, A., Segaert, K., Salmon, D. P., Harvey, D., Shapiro, K., ... & Olichney, J. M. (2020). Event-related potential and EEG oscillatory predictors of verbal memory in mild cognitive impairment. Brain communications, 2(2), fcaa213.

      VanderWeele, T. J. (2019). Principles of confounder selection. European journal of epidemiology, 34, 211-219.

      Valente, M. J., & MacKinnon, D. P. (2017). Comparing models of change to estimate the mediated effect in the pretest–posttest control group design. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 428-450.

      Vimolratana, O., Aneksan, B., Siripornpanich, V., Hiengkaew, V., Prathum, T., Jeungprasopsuk, W., ... & Klomjai, W. (2024). Effects of anodal tDCS on resting state eeg power and motor function in acute stroke: a randomized controlled trial. Journal of NeuroEngineering and Rehabilitation, 21(1), 1-15.

      Zhang, Y., Yang, J., Sevilla, A., Weller, R., Wu, J., Su, C., ... & Candiotti, K. A. (2020). The mechanism of chronic nicotine exposure and nicotine withdrawal on pain perception in an animal model. Neuroscience letters, 715, 134627.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      (1) Rationale and link to chronic pain. I am not sure I agree with the statement "The ability to identify those at greater risk of developing chronic pain is limited". I believe there is an abundance of literature associating risk factors with the different instances of chronic pain (e.g., Mills et al., 2019). The fact that the authors cite studies involving potential neuroimaging biomarkers leads me to believe that they perhaps did not intend to make such a broad statement, or that they wanted to focus on individual prediction instead of population risk.

      We thank the reviewer for the thought put into this comment. We did indeed wish to refer to individual prediction, but also realise that the focus on predicting pain might not be the most appropriate opening for this manuscript. Therefore, we have adjusted the below sentence to refer to the need to identify modifiable factors rather than the need to predict pain.

      “Identifying modifiable factors that influence pain sensitivity could be a key step in reducing the presence and burden of chronic pain (van der Miesen et al., 2019; Davis et al., 2020; Tracey et al., 2021).”

      (2) The statement "Individual peak alpha frequency (PAF) is an electro-physiological brain measure that shows promise as a biomarker of pain sensitivity, and thus may prove useful for predicting chronic pain development" is a non sequitur. PAF may very well be a biomarker of pain sensitivity, but the best measures of pain sensitivity we have (selfreported pain intensity ratings) in general are not in themselves predictive of the development of chronic pain. Conversely, features that are not related to pain sensitivity could be useful for predicting chronic pain (e.g., Tanguay-Sabourin et al., 2023).

      We agree that it is essential to acknowledge that self-reported pain intensity ratings alone are not definitive predictors of chronic pain development. To align with this, we have revised the sentence, removing the second clause to avoid overstatement. The adjusted sentence now reads, "Individual peak alpha frequency (PAF) is an electrophysiological brain measure that shows promise as a biomarker of pain sensitivity."

      (3) Finally, some of the statements in the discussion comparing a tonic heat pain model with chronic neuropathic pain might be an overstatement. Whereas it is true that some of the descriptors are similar, the time courses and mechanisms are vastly different.

      We appreciate this comment, and agree that it is difficult to compare the heat pain model used to clinical neuropathic pain. This was an oversight and with further understanding we have removed this comment from the introduction and the discussion:

      “In parallel, we saw no indication of a relationship between PAF and pain ratings during CPA. The introduction of the CPA model, specifically calibrated to a moderate pain threshold, provides further support for the notion that the relationship between PAF and pain is specific to certain pain types [17,28]. Prolonged heat pain was pre-dominantly described as moderate/severe shooting, sharp, and hot pain, whereas prolonged pressure pain was predominantly described as mild/moderate throbbing, cramping, and aching in the present study. It is possible that the PAF–pain relationship is specific to particular pain models and protocols [12,17].”

      Methodology

      (4) or the benefit of good science. However, I am compelled to highlight that I could not access the preregistered files, even though I waited for almost two weeks after requesting permission to do so. This was a problem on two levels: the main one is that I could not check the hypothesized effect sizes of the sample size estimation, which are not only central to my review, and in general negate all the benefits that should go with preregistration (i.e., avoiding phacking, publication bias, data dredging, HARKing, etc.). The second one is that I had to provide an email address to request access. This allows the authors to potentially identify the reviewers. Whereas I have no issues with this and I support transparent peer review practices (https://elifesciences.org/inside-elife/e3e90410/increasingtransparency-in-elife-s-review-process), I also note that this might condition other reviewers.

      We apologise for this. We had not realised that the pre-registration was under embargo, but we have now made it available.

      Interpretation of results

      (5)To be perfectly clear, I trust the results of this study more than some of the cited studies regarding nicotine and pain because it was preregistered, the sample size is considerably larger, and it seems carefully controlled. I just do not agree with the interpretation of the results, stated in the first paragraph of the Discussion. Quoting J. Cohen, "The primary product of a research inquiry is one or more measures of effect size, not P values" (Cohen, 1990). As I am sure the authors are aware of, even tiny differences between conditions, treatments or groups will eventually be statistically significant given arbitrarily large sample sizes. What really matters then is the magnitude of these differences. In general, the authors hypothesize on why there were no differences on the pressure pain model, and why decreases in heat pain were not mediated by PAF, but do not seem to consider the possibility that the intervention just did not cause the intended effect on the nociceptive system, which would be a much more straightforward explanations for all observations.

      While acknowledging and agreeing with the concern that 'even tiny differences between conditions, treatments, or groups will eventually be statistically significant given arbitrarily large sample sizes,' it's crucial to clarify that our sample size of N=62 does not fall into the category of arbitrarily large. We carefully considered the observed outcomes in the pressure pain model and the lack of PAF mediation in heat pain, as dictated by our statistical approach and the obtained results.

      The suggestion of a straightforward explanation aligning with the intervention not causing the intended effect on the nociceptive system is a valid consideration. We did contemplate the possibility of a false positive, emphasising this in the limitations of our findings and the need for replication to draw stronger conclusions to follow up this initial study.

      (6) In this regard, I do not believe that an average *increase* of 0.05 / 10 (Nicotine post - pre) can be considered a "reduction of pain ratings", regardless of the contrast with placebo (average increase of 0.24 / 10). This tiny effect size is more relevant in the context of the considerable inter-individual variation, in which subjects scored the same heat pain model anywhere from 1 to 10, and the same pressure pain model anywhere from 1 to 8.5. In this regard, the minimum clinically or experimentally important differences (MID) in pain ratings varies from study to study and across painful conditions but is rarely below 1 / 10 in a VAS or NRS scale, see f. ex. (Olsen et al., 2017). It is not my intention to question whether nicotine can function as an acute analgesic in general (as stated in the Discussion), but instead, if it worked as such under these very specific experimental conditions. I also acknowledge that the authors note this issue in two lines in the Discussion, but I believe that this is not weighed properly.

      We appreciate your perspective on the interpretation of the effect size, and we understand the importance of considering it in the context of individual variation.

      As also discussed in response to comment 6 From reviewer 2, we recognize the concern about interpreting the effect of nicotine on prolonged pain solely based on mean results, and in fact wish to discourage this approach. It's crucial to note that both PAF and pain are highly individual measures (i.e. high inter-individual variance), necessitating the use of random intercepts for participants in our analyses to acknowledge the inherent variability at baseline across participants. Including random intercepts rather than only considering the means helps address the heterogeneity in baseline levels among participants. We also recognise that displaying the mean PHP ratings for all participants in Table 2 could be misleading, firstly because these means do not have weight in an analysis that takes into account a random-effects intercept for participants, and secondly because two participants (one from each group) did not have post-gum PHP assessments and were not included in the mediation analysis due to list-wise deletion of missing data. Therefore, to reduce the potential for misinterpretation, we have added extra detail to display both the full sample and CPA mediation analysis (i.e. N=62) and the data used for PHP mediation analysis (i.e. n=60) in Table 2. We hope that the extra details added to this table will help the readers interpretation of results.

      Moreover, we have made sure refer to the comparison with the placebo group when discussing the reduction or decrease in pain seen in the nicotine group, for example:

      “2) nicotine reduced prolonged heat pain intensity but not prolonged pressure pain intensity compared to placebo gum;”

      “The nicotine group had a decrease in heat pain ratings compared to the placebo group and increased PAF speed across the scalp from pre to post-gum, driven by changes at central-parietal and right-frontal regions.”

      We have kept our original comment of whether this effect on pain is meaningful in practice to refer to the minimum clinically or experimentally important differences in pain ratings as highlighted by Olsen et al., 2017.

      “While acknowledging the modest effect size, it’s essential to consider the broader context of our study’s focus. Assessing the clinical relevance of pain reduction is pertinent in applications involving the use of any intervention for pain management [69]. However, from a mechanistic standpoint, particularly in understanding the implications of and relation to PAF, the specific magnitude of the pain effect becomes less pivotal. Nevertheless, future research should examine whether effects on pain increase in magnitude with different nicotine administration regimens (i.e. dose and frequency).”

      (7) In line with the topic of effect sizes, average effect sizes for PAF in the study cited in the manuscript range from around 1 Hz (Boord et al., 2008; Wydenkeller et al., 2009; Lim et al., 2016), to 2 Hz (Foulds et al., 1994), compared with changes of 0.06 Hz (Nicotine post - pre) or -0.01 Hz (Placebo post - pre). MIDs are not so clearly established for peak frequencies in EEG bands, but they should be certainly larger than some fractions of a Hertz (which is considerably below the reliability of the measurement).

      We appreciate your care of these nuances. We acknowledge the differences in effect sizes between our study and those referenced in the manuscript. Given the current state of the literature, it's noteworthy that ‘MIDs’ for peak frequencies in EEG bands, particularly PAF changes, are not clearly established, other than a recent publication suggesting that even small changes in PAF are reliable and meaningful (Furman et al., 2021). In light of this, we have addressed the uncertainty around the existence and determination of MIDs in our revision, highlighting the need for further research in this area.

      In addition, our study employed a greater frequency resolution (0.2 Hz) compared to some of the referenced studies, with approximately 0.5 Hz resolution (Boord et al., 2008; Wydenkeller et al., 2009; Foulds et al., 1994). This improved resolution allows for a more precise measurement of changes in PAF. Considering this, it is plausible that studies with lower resolution might have conflated increases in PAF, and our higher resolution contributes to a more accurate representation of the observed changes.

      We have also incorporated this insight into the manuscript, emphasising the methodological advancements in our study and their potential impact on the interpretation of PAF changes. Thank you for your thoughtful feedback.

      “The ability to detect changes in PAF can be considerably impacted by the frequency resolution used during Fourier Transformations, an element that is overlooked in recent methodological studies on PAF calculation [16,95]. Changes in PAF within individuals might be obscured or conflated by lower frequency resolutions, which should be considered further in future research.”

      (8) The authors also ran alternative statistical models to analyze the data and did not find consistent results in terms of PHP ratings (PAF modulation was still statistically significantly different). The authors attribute this to the necessity of controlling for covariates. Now, considering the effects sizes, aren't these statistically significant differences just artifacts stemming from the inclusion of too many covariates (Simmons et al., 2011)? How much influence should be attributable to depression and anxiety symptoms, stress, sleep quality and past pain, considering that these are healthy volunteers? Should these contrasting differences call the authors to question the robustness of the findings (i.e., whether the same data subjected to different analysis provides the same results), particularly when the results do not align with the preregistered hypothesis (PAF modulation should occur on sensorimotor ROIs)?

      Thank you for your comments on our alternative statistical models. By including these covariates, we aim to provide a more nuanced understanding of the complexities within our data by considering their potential impact on the effects of interest. The decision to include covariates was preregistered (apologies again that this was not available) and made with consideration of balancing model complexity and avoiding potential confounding. Moreover, we hope that the insights gained from these analyses will offer valuable information about the behaviour of our data and aid future research in terms of power calculations, expected variance, and study design.

      (9) Beyond that, I believe in some cases that the authors overreach in an attempt to provide explanations for their results. While I agree that sex might be a relevant covariate, I cannot say whether the authors are confirming a pre-registered hypothesis regarding the gender-specific correlation of PAF and pain, or if this is just a post hoc subgroup analysis. Given the large number of analyses performed (considering the main document and the supplementary files), caution should be exercised on the selective interpretation of those that align with the researchers' hypotheses.

      We chose to explore the influence of sex on the correlation between PAF and pain, because this has also been investigated in previous publications of the relationship (Furman et al., 2020).  We state that the assessment by sex is exploratory in our results on p.17: “in an exploratory analysis of separate correlations in males and females (Figure 5, plot C)”. For clarity regarding whether this was a pre-registered exploration or not, we have adjusted this to be: “in an exploratory analysis (not pre-registered) of separate correlations in males and females (Figure 5, plot C), akin to those conducted in previous research on this topic (Furman et al., 2020),

      We have made sure to state this in the discussion also. Therefore, when we previously said on p.22:

      “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7–11,15] was only observed here for male participants during the PHP model for global PAF.” We have now changed this to: “Regarding the relationship between PAF and pain at baseline, the negative correlation between PAF and pain seen in previous work [7– 11,15] was only observed here for male participants during the PHP model for global PAF in an exploratory analysis.”

      Please also note that we altered the colour and shape of points on the correlation plot (Figure 5 in initial submission), the male brown was changed to a dark brown as we realised that the light brown colour was difficult to read. The shape was then changed for male points so that the two groups can be distinguished in grey-scale.

      Overall, your thoughtful feedback is instrumental in refining the interpretation of our findings, and we look forward to presenting a more comprehensive and nuanced discussion. Thank you for your comments.

      REFERENCES for responses to reviewer 3

      Arendt-Nielsen, L., & Yarnitsky, D. (2009). Experimental and clinical applications of quantitative sensory testing applied to skin, muscles and viscera. The Journal of Pain, 10(6), 556-572.

      Chowdhury, N. S., Skippen, P., Si, E., Chiang, A. K., Millard, S. K., Furman, A. J., ... & Seminowicz, D. A. (2023). The reliability of two prospective cortical biomarkers for pain: EEG peak alpha frequency and TMS corticomotor excitability. Journal of Neuroscience Methods, 385, 109766.

      Fishbain, D. A., Lewis, J. E., & Gao, J. (2013). Is There Significant Correlation between SelfReported Low Back Pain Visual Analogue Scores and Low Back Pain Scores Determined by Pressure Pain Induction Matching?. Pain practice, 13(5), 358-363.

      Furman, A. J., Prokhorenko, M., Keaser, M. L., Zhang, J., Chen, S., Mazaheri, A., & Seminowicz, D. A. (2021). Prolonged pain reliably slows peak alpha frequency by reducing fast alpha power.

      bioRxiv, 2021-07.

      Heitmann, H., Ávila, C. G., Nickel, M. M., Dinh, S. T., May, E. S., Tiemann, L., ... & Ploner, M. (2022). Longitudinal resting-state electroencephalography in patients with chronic pain undergoing interdisciplinary multimodal pain therapy. Pain, 163(9), e997.

      McLain, N. J., Yani, M. S., & Kutch, J. J. (2022). Analytic consistency and neural correlates of peak alpha frequency in the study of pain. Journal of neuroscience methods, 368, 109460.

      Ngernyam, N., Jensen, M. P., Arayawichanon, P., Auvichayapat, N., Tiamkao, S., Janjarasjitt, S., ... & Auvichayapat, P. (2015). The effects of transcranial direct current stimulation in patients with neuropathic pain from spinal cord injury. Clinical Neurophysiology, 126(2), 382-390.

      Parker, T., Huang, Y., Raghu, A. L., FitzGerald, J., Aziz, T. Z., & Green, A. L. (2021). Supraspinal effects of dorsal root ganglion stimulation in chronic pain patients. Neuromodulation: Technology at the Neural Interface, 24(4), 646-654.

      Petersen-Felix, S., & Arendt-Nielsen, L. (2002). From pain research to pain treatment: the role of human experimental pain models. Best Practice & Research Clinical Anaesthesiology, 16(4), 667680.

      Sarnthein, J., Stern, J., Aufenberg, C., Rousson, V., & Jeanmonod, D. (2006). Increased EEG power and slowed dominant frequency in patients with neurogenic pain. Brain, 129(1), 55-64.

      Sato, G., Osumi, M., & Morioka, S. (2017). Effects of wheelchair propulsion on neuropathic pain and resting electroencephalography after spinal cord injury. Journal of Rehabilitation Medicine, 49(2), 136-143.

      Sufianov, A. A., Shapkin, A. G., Sufianova, G. Z., Elishev, V. G., Barashin, D. A., Berdichevskii, V. B., & Churkin, S. V. (2014). Functional and metabolic changes in the brain in neuropathic pain syndrome against the background of chronic epidural electrostimulation of the spinal cord. Bulletin of experimental biology and medicine, 157(4), 462-465.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Amaral et al. presents a study investigating the mesoscale modelling and dynamics of bolalipids.

      Strengths:

      The figures in this paper are exceptional. Both those to outline and introduce the lipid types, but also the quality and resolution of the plots. The data held within also appears to be outstanding and of significant (hopefully) general interest.

      We thank the reviewer for their kind words and the appreciation of our work.

      Weaknesses:

      In the introduction, I would like to have read more specifics on the biological role of bolalipids. Archaea are mentioned, but this kingdom is huge - there must be specific species that can be discussed where bolalipids are integral to archaeal life. The authors should go beyond ’extremophiles’. In short, they should unpack why the general audience should be interested in these lipids, within a subset of organisms that are often forgotten about.

      Following the reviewer’s advice we have revised the introduction of the manuscript, in which we now discuss specific species (Sulfolobus acidocaldarius and Thermococcus kodakarensis) and how in these species bolalipids are integral to archaeal life. We explain that the ratio between bilayer and bolalipids, and the number of cyclopentane rings contained within bolalipids can change to adapt to the environment. The revised parts of the introduction read (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to understand the biophysical properties of archeal membranes made of bolalipids. Bacterial and eukaryotic membranes are made of lipids that self-assemble into bilayers. Archea, instead, use bolalipids, lipids that have two headgroups and can span the entire bilayer. The authors wanted to determine if the unique characteristics of archaea, which are often extremophiles, are in part due to the fact that their membranes contain bolalipids.

      The authors develop a minimal computational model to compare the biophysics of bilayers made of lipids, bolalipids, and mixtures of the two. Their model enables them to determine essential parameters such as bilayer phase diagrams, mechanical moduli, and the bilayer behaviour upon cargo inclusion and remodelling.

      The author demonstrates that bolalipid bilayers behave as binary mixtures, containing bolalipids organized either in a straight conformation, spanning the entire bilayer, or in a u-shaped one, confined to a single leaflet. This dynamic mixture allows bolalipid bilayers to be very sturdy but also provides remodelling. However, remodelling is energetically more expensive than with standard lipids. The authors speculate that this might be why lipids were more abundant in the evolutionary process. Strengths:

      This is a wonderful paper, a very fine piece of scholarship. It is interesting from the point of view of biology, biophysics, and material science. The authors mastered the modelling and analysis of these complex systems. The evidence for their findings is really strong and complete. The paper is written superbly, the language is precise and the reading experience is very pleasant. The plots are very well-thought-out.

      Weaknesses:

      I would not talk about weaknesses, because this is really a nice paper. If I really had to find one, I would have liked to see some clear predictions of the model expressed in such a way that experimentalists could design validation experiments.

      We thank the reviewer for their very kind assessment. We incorporated their recommendations regarding experimental validation in the discussion section, as follows (p.14):

      “Our model makes a number of predictions that could be tested by experiment either in cells or in vitro. First, it predicts that a small increase in the fraction of archaeal bilayer lipids should be sufficient to soften a bolalipid-rich membrane. While this could be tested in the future, so far only very few studies have yet reported experimental analysis of archaeal membrane mixtures [18, 50]. Second, we observed that membranes with moderate bolalipid molecular rigidity k<sub>bola</sub> exhibit curvature-dependent bending rigidity. To experimentally verify this, one could extrude membrane tethers from cells while controlling for membrane tension. Finally, to get to the core mechanism underlying our findings, it will be important to develop experimental methods that will allow the fraction of U-shaped bolalipid conformers per leaflet to be imaged and measured.”

      Reviewer #3 (Public review):

      Summary:

      The authors have studied the mechanics of bolalipid and archaeal mixed-lipid membranes via comprehensive molecular dynamics simulations. The Cooke-Deserno 3-bead-per-lipid model is extended to bolalipids with 6 beads. Phase diagrams, bending rigidity, mechanical stability of curved membranes, and cargo uptake are studied. Effects such as the formation of U-shaped bolalipids, pore formation in highly curved regions, and changes in membrane rigidity are studied and discussed. The main aim has been to show how the mixture of bolalipids and regular bilayer lipids in archaeal membrane models enhances the fluidity and stability of these membranes.

      Strengths:

      The authors have presented a wide range of simulation results for different membrane conditions and conformations. For the most part, the analyses and their results are presented clearly and concisely. Figures, supplementary information, and movies very well present what has been studied. The manuscript is well-written and is easy to follow.

      We thank the reviewer for the detailed assessment of our work and their constructive feedback.

      Major issues

      R3.Q1: The Cooke-Deserno model, while very powerful for biophysical analysis of membranes at the mesoscale, is very much void of chemical information. It is parametrized such that it is good in producing fluid membranes and predicting values for bending rigidity, compressibility, and even thermalexpansioncoefficientfallingintheacceptedrangeofvaluesforbilayermembranes. But it still represents a generic membrane. Now, the authors have suggested a similar model for the archaeal bolalipids, which have chemically different lipids (the presence of cyclopentane rings for one), and there is no good justification for using the same pairwise interactions between their representative beads in the coarse-grained model. This does not necessarily diminish the worth of all the authors’ analyses. What is at risk here is the confusion between ”what we observe this model of bolalipidor mixed-membranes do” and ”how real bolalipid-containing archaeal membranes behave at these mechanical and thermal conditions.”.

      As the reviewer correctly notes, Cooke and Deserno used a minimal model, devoid of chemical detail, to represent fluid lipid membranes composed of bilayer lipids. Indeed archaeal lipids are chemically different compared to non-archaeal lipids, but just like non-archaeal lipids, they can be very different from one another. Given the chemical diversity of bolalipids between each other, instead of representing their complexity in a complicated model with many experimentally unconstrained parameters, we here defined a minimal model for bolalipids. The power of this minimal model is to represent the key physical/geometrical characteristics of archaeal membranes, namely the fact that lipid heads on two sides of the membrane are often connected, that bolalipids can exhibit a conformational change, and that bolalipids mix with some percentage of bilayer molecules. We then ask a general question: how do these unique geometrical characteristics of archaeal membranes influence their mechanics and reshaping? The reviewer is however right in pointing out that a model, regardless of its level of details (atomistic, coarse-grained, minimal), is still a model.

      Our approach of extending an established coarse-grained model for bilayer lipids to bolalipids is further supported by experimental observations, which report that archaeal bilayer lipids can form membranes of comparable bending rigidity to those of non-archaeal bilayer membranes [53]. Hence, different lipid linkages (archaeal vs. non-archaeal) give rise to fluid, deformable membranes of not too dissimilar rigidities, suggesting that both archaeal and non-archaeal bilayer lipids can be represented by a similar minimal coarse-grained model for the purpose of mesoscopic biophysical investigations. Since archaeal bolalipids have the same core chemical structure as two archaeal bilayer lipids joined by their tail ends, similarly we model a bolalipid by joining two bilayer lipids. Such an approach also efficiently enables us to compare bolalipid with bilayer membranes, and connect to the large body of knowledge on the physics of bilayer membranes.

      To conclude, our coarse-grained model is indeed intended to capture the main physical properties of bolalipid membranes, and not their chemical diversity.

      R3.Q2: Another more specific, major issue has to do with using the Hamm-Kozlov model for fitting the power spectrum of thermal undulations. The 1/q<sup>2</sup> term can very well be attributed to membrane tension. While a barostat is indeed used, have the authors made absolutely sure that the deviation from 1/q<sup>4</sup> behaviour does not correspond to lateral tension?

      To the casual observer, any 1/q<sup>2</sup> trend might point at membrane tension. However, the precise functional form is relevant as it determines whether the 1/q<sup>2</sup> dominates the 1/q<sup>4</sup> trend for small or large values of the wave number q in the fitted power spectrum.

      The first model (including lipid tilt) exhibits the functional form 1/(kq<sup>4</sup>) + 1/(kq<sup>2</sup>). In contrast, the second model (including membrane tension) exhibits the functional form 1/(kq<sup>4</sup> + ∑q<sup>2</sup>). Importantly, the two models obey a different functional form. Here k and k<sub>θ</sub>, are the bending and tilt moduli, which are assumed positive, and ∑ is the membrane tension, which can be either positive or negative. For the first model (with tilt), while for small q the amplitude is proportional to q<sup>-4</sup>, for large q the amplitude is proportional to q<sup>-2</sup>. In contrast, for the second model (with positive tension) while for small q the amplitude is proportional to q<sup>-2</sup>, for large q the amplitude is proportional to q<sup>-4</sup>. If membrane tension were to be negative in the second model, the slope would cross from negative infinity for small q to -4 for large q. The functional dependencies are summarized in Author response image 1A.

      For rigid bolalipid membranes, it is clearly visible that the slope of the power spectrum plotted against the wave number q decreases with increasing q (Author response image 1B). While the slope initially assumes a value close to 4, it gradually approaches 2 for larger values of q. We conclude that only the model including lipid tilt can fit the power spectrum of membrane fluctuations appropriately (solid-dashed line), whereas the model with tension fails to fit the data (dashed line). We note that the combined model containing both lipid tilt and membrane tension does not give a better fit (dotted line).

      To demonstrate that the tension model cannot fit the data, we included the best fits for both models for rigid bolalipid membranes in the new SI section 16 (p. S22) and show that only the tilt model leads to acceptable fits. We also measured the projected membrane tension - , where P<sub>x</sub>,P<sub>y</sub> are respectively the pressure in x and y direction and  L<sub>z</sub> is the dimension of the simulation box in z axis. We found the projected membrane tension to give a negligible value similarly to the one that we indirectly measured by fitting a combined model with both tension and tilt, further confirming our conjecture.

      Author response image 1.

      (A) Schematic showing the decay of the power spectrum as a function of the wave number q in the tilt model (top), in the tension model with positive membrane tension (middle), and in the tension model with negative membrane tension (bottom). (B) Fitted power spectrum as a function of q for rigid bolalipid membranes (k<sub>bola</sub>=5k<sub>B</sub>T). The fit shows that while the model with tension (dashed line) cannot fit the data, the model with tilt nicely fits the spectrum (solid-dashed line). The combined model including both tension and tilt does not fit the spectrum any better (dotted line).

      R3.Q3: I got more worried when I noticed in the SI that the simulations had been done with combined ”fix langevin” and ”fix nph” LAMMPS commands. This combination does not result in a proper isothermal-isobaric ensemble. The importance of tilt terms for bolalipids is indeed very interesting, but I believe more care is needed to establish that.

      In what follows, we show that there is no reason to worry. First of all we want to clarify that the physical setup we simulate is that of a membrane contained in a heat bath under negligible tension with correct diffusional dynamics. To achieve this physical setup, for which we use a Langevin thermostat combined with pressure control via an overdamped barostat, which we implement in LAMMPS by combining ”fix langevin” and ”fix nph”.

      In more detail: we simulated particles in an implicit solvent, for which we use a Langevin thermostat to get the right diffusional dynamics. To apply the theory of fitting fluctuation spectrums the simulation box length needs to be (near) constant. However, simulating membranes at a fixed box size results in an average non-zero membrane tension, making it hard to measure bending rigidity. The reason is that the effect of membrane tension is most influential on the largest wavelength modes, which are also most decisive when determining mechanical membrane properties like membrane rigidity. To minimize the effect of tension, we perform our simulation with an overdamped barostat (𝜏<sub>baro</sub> = 10 𝜏 <sub>langevin</sub>), which keeps the membrane near tensionless, as also done before [32]. In the revised manuscript, we have clarified the statement on the physical ensemble used (p.S2):

      “For simulating flat membrane patches of bolalipids, we combined the previously used Langevin thermostat with relaxation time of 1𝜏 with a Nosé–Hoover barostat with relaxation time of 10𝜏. In LAMMPS this amounts to combining the commands ’fix langevin’ with ’fix nph’. We configured the barostat to set lateral pressure P<sub>xy</sub> to zero by re-scaling the simulation box in the x-y plane. We compare this setup to a fixed box length setup, and an NPT ensemble setup, in SI section 17.”

      To connect our results with statistical mechanics ensemble theory we tested alternative setups. Similar setups, including the formal isothermal-isobaric ensemble, where N,P,T are kept constant using Nose-Hoover style equations for thermostating and barostating with modern corrections [34], which the reviewer refers to, result in very similar fluctuation spectrums. Consequently, our measurements of bending and tilt modulus hold true regardless of the integration scheme. However, such a setup does not correctly capture implicit solvent and diffusional dynamics.

      In even more detail: we tested our setup (implemented via ”fix langevin”+”fix nph”) versus a isothermal-isobaric ensemble (implemented via ”fix npt”). We measured volume mean and standard deviation, and found them matching for a reference LJ gas.

      To be completely sure, and to please the reviewer, we have performed additional verifications in the new SI section 17, which we summarize in the following. We simulated three representative membranes with different integration schemes: ”fix npt”, ”fix langevin”+”fix nph”, and ”fix langevin” (Langevin dynamics with projected area fixed at the average value obtained from a ”langevin+nph”). We checked that the ”fix nph” barostat is merely equilibrating the membrane to a tensionless configuration, after which the projected membrane area (A<sub>p</sub> = L<sub>x</sub>L<sub<y</sub>) is practically constant. Consequently, the different schemes resulted in minor changes in the longest wavelength modes that we tracked down to small changes in the negligible tension. The resulting measurements of bending modulus change by less than 10%, and our main text conclusions do not change. Author response image 2 compares the fluctuation spectrums for the different integration schemes.

      Author response image 2.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, simulated with Langevin dynamics (pink, ‘langevin‘), our setup (purple, ‘nph+langevin‘), and under an isothermal-isobaric ensemble (blue, ‘npt‘); fits are shown as dotted lines.

      R3.Q4: This issue is reinforced when considering Figure 3B. These results suggest that increasing the fraction of regular lipids increases the tilt modulus, with the maximum value achieved for a normal Cooke-Deserno bilayer void of bolalipids. But this is contradictory. For these bilayers, we don’t need the tilt modulus in the first place.

      We understand the concern why this might be counter-intuitive, and we thank the reviewer for pointing it out. We first want to stress that the tilt modulus can also be measured for bilayer membranes even if it is not needed to fit the fluctuation spectrum. If we measure the tilt modulus for a bilayer membrane, we obtain a value similar to the previously measured one [36]. Importantly, here we also report measurements for the tilt modulus for bolalipid membranes.

      To understand the seemingly contradictory behaviour of the tilt modulus, it is insightful to rewrite the expression for the fluctuation spectrum as done in Eq. (1):

      where is a characteristic length scale related to tilt, which we call the tilt persistence length. From the last equation it is easy to see that the tilt modulus 𝜅<sub>𝜃</sub> becomes relevant for the fluctuation spectrum if the tilt persistence length l<sub>𝜃</sub>  is not negligible. In other words, this means that we have to consider the tilt modulus 𝜅<sub>𝜃</sub> as relevant, if it is sufficiently small compared to the bending rigidity 𝜅.

      However, this is not only counter-intuitive, but also difficult to communicate graphically. Per the excellent reviewer’s suggestion, to make the interpretation more accessible, we converted in the main text and its figures the tilt modulus to the more directly interpretable tilt persistence length l<sub>𝜃</sub>, as this is small when tilt is irrelevant (for bilayer lipids and flexible bolalipids) and large otherwise (for rigid bolalipids). This includes changes to the main text on p.6 and p.8 , and to the insets in Figs. 2C and 3B. We note that for completeness we also report the tilt modulus 𝜅<sub>𝜃</sub>  in the SI.

      R3.Q5: Also, from the SI, I gathered that the authors have neglected the longest wavelength mode because it is not equilibrated. If this is indeed the case, it is a dangerous thing to do, because with a small membrane patch, this mode can very well change the general trend of the power spectrum. As a lot of other analyses in the manuscript rely on these measurements, I believe more elaboration is in order.

      We thank the reviewer for the careful examination of our supplementary material. For each fluctuation spectrum measurement, we ran multiple replicas. We observed that the largest wavelength modes were not fully equilibrated. In the simulations the first mode of the fluctuation spectrum is probed at different amplitudes and phases. We thus expected the potential systematic error would show up clearly when comparing spectrums of the different replicas. As we saw no correlation in these systematic offsets between replicas, we concluded that the simulations are sufficiently equilibrated and we could safely exclude the first mode of the fluctuation spectrum from our analysis.

      To show without doubt that this procedure does not randomly bias our results, we also ran simulations for three representative membranes until all modes were equilibrated. On the modes previously equilibrated, the resulting spectrums agree with our previous shorter simulations. On the largest wavelength modes that were previously not fully equilibrated, we noticed a small deviation from theory, specifically for flexible membranes (small bending modulus). These small deviations can be explained by including a negligible negative tension. Importantly, however, the resulting bending modulus σ stays nearly the same. We note that the small negative tension disappears when we halve the timestep (see Author response image 3). This verification is shown in SI section 17.

      R3.Q6: The authors have found that ”there is a strong dependency of the bending rigidity on the membrane mean curvature of stiffer bolalipids.” The effect is negative, with the membrane becoming less stiff at higher mean curvatures. Why is that? I would assume that with more flexible bolalipids, the possibility of reorganization into U-shaped chains should affect the bending rigidity more (as Figure 2E suggests). While for a stiff bolalipid, not much would change if you increase the mean curvature. This should be either a tilt effect, or have to do with asymmetry between the leaflets. But on the other hand, the tilt modulus is shown to decrease with increasing bolalipid rigidity. The authors get back to this issue only on page 10, when they consider U-shaped lipids in the inner and outer leaflets and write, ”this suggested that an additional membrane-curving mechanism must be involved.” But then again, in the Discussion, the authors write, ”It is striking that membranes made from stiffer bolalipids showed a curvature-dependent bending modulus, which is a clear signature that bolalipid membranes exhibit plastic behaviour during membrane reshaping,” adding to the confusion.

      Author response image 3.

      Height fluctuation spectrum, for a bilayer membrane at T<sub>eff</sub> =1.1, as simulated in the main text (grey, for 60⇥10<sup>3</sup>τ), for longer duration (1_.44⇥10<sup>6</sup>τ) (pink), and with the longer duration and halved timestep =0.005_τ(purple); fits are shown as dotted lines (tension and tilt) or dash-dot lines (tilt only).

      We thank the reviewer for asking this important question. Membrane bending rigidity in bolalipid membranes decreases dramatically once a small fraction of U-shapes is allowed to form, but then plateaus once this U-shape fraction reaches 20%. In a curved bolalipid membrane, U-shapes must accumulate in the outer leaflet to accommodate for area difference. Together, the bending rigidity non-linear dependence on U-shape fraction, and the promotion of U-shapes by curvature, explain why in a membrane made of moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T), which contain very few U-shapes in the flatstate, the bending rigidity of the membrane decreases as curvature increases. While in a membrane made of flexible bolalipid molecules (k<sub>bola</sub> = 0), where many U-shapes are present in the flat membrane, the bending rigidity does not change with curvature.

      Bending rigidity 𝜅 in flat membranes composed of bolalipids decreases dramatically once a small fraction of U-shapes is allowed to form, but plateaus once more than 20% of U-shaped bolalipids are present. In details, our data shows that with an increasing bolalipid molecular rigidity k<sub>bola</sub>, both the number of U-shaped bolalipids decreases (Fig. 2B) and the membrane rigidity 𝜅 increases (Fig. 2C). Thus, the correlation suggests that U-shaped bolalipids soften the membrane, in a non-linear way where most of the change in membrane bending rigidity happens for U-shaped bolalipid fraction < 20% (Figure S11).

      Separately, membrane curvature affects the area difference between curved membrane leaflets and thus drives U-shape accumulation. To be specific, a cylindrical membrane with area A, mean curvature H and thickness h has the outer leaflet with area A(1 + Hh) and the inner leaflet with smaller area A(1 Hh). This can be large, in our simulations up to an area change of Hh \= 25%. For pure bolalipid membranes, straight bolalipids occupy the same space in each leaflet. Area difference can then be achieved only by having a different amount of U-shaped bolalipids in each leaflet, which can result in a different U-shape fraction between leaflets and thus ’asymmetry between leaflets’. Figure S10 confirms U-shape head fraction asymmetry that increases with curvature, for both flexible (k<sub>bola</sub> = 0) and moderately stiff bolalipids (k<sub>bola</sub> = 1k<sub>B</sub>T).

      Together, these two effects result in membrane softening under curvature for the moderately stiff bolalipids, but constant rigidity for flexible bolalipids (Fig. 2F). In details: for membranes composed of moderately stiff bolalipid molecules (k<sub>bola</sub> = 1k<sub>B</sub>T), the U-shape bolalipid head fraction only increases in the outer leaflet, goingfrom10to20%(Figure S10). This is in the high sensitivity region where the bending rigidity is expected to change the most (Figure S11). We hypothesize that the molecular rigidity of a U-shaped bolalipid creates compression on the outer leaflet that stabilizes the membrane curvature and thus causes membrane softening. We suspect that for membranes composed of rigid bolalipids (k<sub></sub> > 1k<sub>B</sub>T), the effect is likely not present due to the absence of U-shape formation even under strong bending.

      By contrast, for membranes composed of flexible bolalipids (k<sub></sub> = 0), the U-shaped bolalipid head fraction changes relatively little from its value for flat membranes (from 50% to respectively 60 and 40% for the outer and inner leaflet, Figure S10). This is in the region where the membrane bending rigidity is expected to respond weakly to U-shape fraction (Figure S11). Additionally, the change is symmetric, so presumably the outer leaflet becomes softer as the inner leaflet becomes stiffer, thus creating opposing effects and only weakly affecting the membrane bending rigidity as a whole. We note that the distinction between the U-shape head fraction that we plot (Figure S10) and U-shape fraction (Figure S11) matters little for this analysis.

      We have added this deduction and its plots to SI section 8, and revised the corresponding statement in the main text accordingly (p.7 ).

      “Changing membrane curvature alters the area differently in the two membrane leaflets. To adapt to the area difference, we thus expect the fraction of U-shaped bolalipids to change as the membrane curvature changes. Moreover, the results of Fig. 2B and Fig. 2C showed that the U-shaped bolalipid fraction and the membrane bending rigidity are correlated. As a result, we predict that the fraction of straight versus U-shaped bolalipids in a membrane will change in response to membrane bending, in a way that makes the bending rigidity of a bolalipid membrane curvature dependent.”

      R3.Q7: This issue is repeated when the authors study nanoparticle uptake. They write: ”to reconcile these seemingly conflicting observations we reason that the bending rigidity, similar to Figure 2F, is not constant but softens upon increasing membrane curvature, due to dynamic change in the ratio between bolalipids in straight and U-shaped conformation. Hence, bolalipid membranes show stroking plastic behaviour as they soften during reshaping.” But the softening effect that they refer to, as shown in Figure 4B, occurs for very stiff bolalipids, for which not much switching to U-shaped conformation should occur.

      We thank the reviewer for locating a particularly dense sentence. We changed the text to explicitly refer to the range k<sub></sub> 2 [0,2] k<sub>B</sub>T for which there is significant change in U-shape fraction (p.8 ):

      “To reconcile these seemingly conflicting observations we reason that the bending rigidity κ, similar to Fig. 2F, is not constant but softens in the range k<sub></sub> 2 [0,2] k<sub>B</sub>T, upon increasing membrane curvature. This is due to the dynamic change in the ratio between bolalipids in straight and U-shaped conformation.”

      As for Fig. 4B, for k<sub></sub> > 2k<sub>B</sub>T, pores form thus explaining the plateau in adsorption energy.

      R3.Q8: Another major issue is with what the authors refer to as the ”effective temperature”. While plotting phase diagrams for kT/eps value is absolutely valid, I’m not a fan of calling this effective temperature. It is a dimensionless quantity that scales linearly with temperature, but is not a temperature. It is usually called a ”reduced temperature”. Then the authors refer to their findings as studying the stability of archaeal membranes at high temperatures. I have to disagree because eps is not the only potential parameter in the simulations (there are at least space exclusion and angle-bending stiffnesses) so one cannot identify changing eps with changing the global simulation temperature. This only works when you have one potential parameter, like an LJ gas.

      We indeed thought about this before and found that it makes little difference in our set-up. To thoroughly show that the distinction matters very little, per reviewer’s question, we computed our phase diagrams by scaling temperature T explicitly (and not lipid tail interactions T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>). We added these results to the SI section 14 and found no significant difference when comparing scaling tail interactions (Figure S15A) with scaling temperature explicitly (Figure S15B).

      We also computed Fig. 2A-C for scaling interactions (Figure S17A) and scaling temperature explicitly (Figure S17B). We found a slightly increased U-shaped bolalipid fraction for low k<sub></sub> when comparing scaling interactions (Figure S17A) with temperature scaling (Figure S17B). The reason is that the U-shaped fraction depends on temperature, as with higher temperature bolalipids can easier transition into the U-shape. Most importantly, however, we found no qualitative changes on the liquid region or the mechanical membrane properties when we compared the different scaling variants.

      The reason why both scaling variants match so well can be understood easily. All pair potentials, including volume exclusion interactions between head beads and other membrane beads, were also scaled in the same manner as tail-to-tail interactions, as described in the SI. In contrast, the energy scales for maintaining the lipid bonds, the bilayer lipid angles and the bolalipid angles are relatively large compared to the energy scales involved in tail-to-tail interactions. This separation of energy scales guarantees that there will be little effect when increasing global temperature. Regarding nomenclature, we take the reviewer’s advice and have added ’reduced temperature’ as an alias for T<sub>eff</sub> in the main text.

      In the revised version of the manuscript, we mention these observations in the SI section 14 and point towards these results in the main text (p.4 ):

      “This interaction strength governs the membrane phase behaviour and can be interpreted as the effective temperature or reduced temperature T<sub>eff</sub> = k<sub>B</sub>T /ϵ<sub>p</sub>. As the distinction between scaling interactions (T<sub>eff</sub>) or temperature (T) is not important for our analysis (see Supplemental Information (SI) section 14), for simplicity we refer to T<sub>eff</sub> as temperature in the following.”

      Minor issues

      R3.Q9: As the authors have noted, the fact that the membrane curvature can change the ratio of U-shaped to straight bolalipids would render the curvature elasticity non-linear (though the term ”plastic” should not be used, as this is still structurally reversible when the stress is removed. Technically, it is hypoelastic behaviour, possibly with hysteresis.) With this in mind, when the authors use essentially linear elastic models for fluctuation analysis, they should make a comparison of maximum curvatures occurring in simulations with a range that causes significant changes in bolalipid conformational ratios.

      We thank the reviewer for their suggestion on calling the non-linear behaviour of the curvature elasticity hypoelastic. We have edited the main text accordingly (p.8 ):

      “In an elastic material, the strain modulus holds constant and deformation is reversible. For bolalipid membranes at k<sub></sub> = 1k<sub>B</sub>T, however, the bending modulus decreases when deformation increases, rendering bolalipid membranes hypoelastic.”

      Moreover, regarding the maximum curvatures occurring in the fluctuation simulations: We first note that the ensemble average of the mean curvature H from the fluctuation measurements is indicated as a vertical line in Fig. 2F. As the average value is nearly zero, the membrane can be considered as flat in good approximation. To investigate the question in more detail, we extended the SI with a careful analysis of the validity of the maximum membrane curvature and the validity of the Monge gauge approximation (SI section 15).

      In short, we found that the involved membrane curvatures are small and therefore are unlikely to trigger any significant changes of the bending modulus. Moreover, since we are dealing with two bolalipid conformations, we also tested the homogeneity of the membrane. In our simulations of flat membrane patches we did not observe clustering or phase separation between the two bolalipid conformations beyond the [2,3]σ range. Furthermore, we get good agreement between our fluctuation measurement and the cylinder simulations in Fig. 2F. We now mention this verification in the revised version of the manuscript (p.8 ):

      “Fortunately, this dependency on curvature does not invalidate our fluctuation results, where the curvature is small enough that its effect on the bending modulus is negligible (SI section 15).”

      Last but least, simulating bending/unbending cycles of an arc-shaped membrane (frozen endpoints) shows agreement with cylinder membrane simulations, and no hysteresis at the rates of deformation employed (cf. M. Amaral’s thesis [54], soon to be out of the embargo period).

      R3.Q10: The Introduction section of the manuscript is written with a biochemical approach, with very minor attention to the simulation works on this system. Some molecular dynamics works are only cited as existing previous work, without mentioning what has already been studied in archaeal membranes. While some information, like the binding of ESCRT proteins to archaeal membranes, though interesting, helps little to place the study within the discipline. The Introduction should be revised to show what has already been studied with simulations (as the authors mention in the Discussion) and how the presented research complements it.

      The present research for the first time covers archaeal membranes with a single coarse-grained model capable of assuming both bolalipid in-membrane conformations and sweeps through temperature, membrane composition, and molecular rigidity. The work shows the first curvature dependent bending modulus for pure bolalipid membranes. It also investigates systematically bending modulus and Gaussian modulus, and tests the model in an all-encompassing budding simulation that incorporates topology changes. Existing atomistic or coarse-grained MD simulations (MARTINI or similar force fields) are limited to small patches of membrane, with no study of large-scale deformations or topology changes; plus, they rely on force fields that were parametrized for bilayer membranes.

      To give a comprehensive overview of the field, we revised the introduction section of the manuscript, in which we now discuss previous computational work investigating membrane diffusivity, U-shaped lipid fraction, and bending rigidity (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      Following the reviewer’s advice and to keep the introduction concise and focused on bolalipid membranes, we have removed the paragraph on ESCRT-III proteins in the revised manuscript.

      R3.Q11: The authors have been a bit loose with using the term ”stability”. I’d like to see the distinction in each case, as in ”chemical/thermal/mechanical/conformational stability”.

      We have clarified when applicable the type of stability throughout the manuscript. In all other instances, if not clear from context, we mean simply that the membrane persists being a membrane. At our coarse-grained level, this means the membrane does not disassemble into a gas phase.

      R3.Q12: In the original Cooke-Deserno model, a so-called ”poorman’s angle-bending term” is used, which is essentially a bond-stretching term between the first and third particle. However, I notice the authors using the full harmonic angle-bending potential. This should be mentioned.

      This is made clear in the SI (Eq. (S3)). Cooke and Deserno mention the harmonic angle potential as a valid alternative in their original publication. We now also added this detail to the main text (p.3 ):

      “The angle formed by the chain of three beads is kept near 180° via an angular potential with strength k<sub>0</sub>, instead of the approximation by a bond between end beads of the original model [32].”

      R3.Q13: The analysis of energy of U-shaped lipids with the linear model E \= c<sub>0</sub> + c<sub>1</sub>k<sub></sub> is indeed very interesting. I am curious, can this also be corroborated with mean energy measurements? The minor issue is calling the source of the favorability of U-shaped lipids ”entropic”, while clearly an energetic contribution is found. The two conformations, for example, might differ in the interactions with the neighbouring lipids.

      We were also curious and thank the reviewer for the suggestion of mean energy measurements. We concluded that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids. We have now included these measurements in SI section 6 (p.S5 ):

      “By splitting the average potential energy between an internal contribution (bonds, angles and pair interactions between particles in the same molecule) and an external contribution (pair interactions between a molecule and its neighbours), we determined the transition energy from straight to U-shaped bolalipids in detail. We found that this transition lowers the internal potential energy of the bolalipid while increasing its interaction energy. In total, we obtained an energy barrier for the transition of ΔE<sub>s→u</sub> = 0.79±0.01k<sub>B</sub>T. Since the fit indicates, however, that the U-shaped bolalipid conformation is preferred over the straight conformation, we conclude that there must be either an entropic contribution to the free energy or an intermolecular interaction energy favouring U-shaped bolalipids.”

      We refer to these measurements in the main text (p.6 ):

      “For the fit it appears that c<sub>0</sub> < 0, which implies that bolalipids in U-shape conformation are slightly favoured over straight bolalipids at k<sub></sub> = 0 (explored in SI section 6).”

      R3.Q14: The authors write in the Discussion, ”In any case, our results indicate that membrane remodelling, such as membrane fission during membrane traffic, is much more difficult in bolalipid membranes [34].” Firstly, I’m not sure if studying the dependence of budding behaviour on adhesion energy with nanoparticles is enough to make claims about membrane fission. Secondly, why is the 2015 paper by Markus Deserno cited here?

      We thank the reviewer for giving us the opportunity to clarify. We make an energetic argument on membrane fission based on the observed difference in the ratio of .

      Splitting a spherical membrane vesicle into two spherical vesicles (fission) increases the bending energy by 8𝜋𝜅 and decreases the energy related to the Gaussian bending modulus by . The second part of the argument is given for example in the review by Markus Deserno (p.23, right column), that’s why we cite the paper here. Together, this gives an energy barrier, required for membrane fission in the considered geometry of ∆E<sub>fission</sub> = . We found that is around 0.5 for bolalipid membranes and around 1 for bilayer membranes. Since 𝜅 was typically larger in bolalipid membranes we thus expect the energy barrier for fission ∆E<sub>fission</sub> to be larger for bolalipid membranes. We therefore predict that membrane remodelling, such as membrane fission during membrane trafficking, is harder in bolalipid membranes. We explain our reasoning in the discussion of the revised manuscript (p.13 ):

      “Membrane remodelling, such as the fission of one spherical vesicle into two, increases the bending energy by 8πκ but decreases the energy related to the Gaussian modulus by – [39], giving rise to a fission energy barrier of ∆E<sub>fission</sub> = . Our results indicated that while in bolalipid membranes 𝜅 is larger, is smaller compared to bilayer membranes. Our results thus predict a larger energy barrier for membrane fission ∆E<sub>fission</sub> in bolalipid membranes compared to bilayer membranes.”

      R3.Q15: In the SI, where the measurement of the diffusion coefficient is discussed, the expression for D is missing the power 2 of displacement.

      We thank the reviewer for spotting this oversight. We corrected it in the revised version of the SI (p.S5 ).

      R3.Q16: Where cargo uptake is discussed, the term ”adsorption energy” is used. I think the more appropriate term would be ”adhesion energy”.

      For the sake of simplicity, we changed the term to adhesion energy (caption of Fig. 4, and p.10). We do not have a strong opinion on this, but we believe that adsorption energy would be equally correct as we describe the adsorption of many lipid head beads to a nanoparticle.

      R3.Q17: Typos:

      Page 1, paragraph 2: Adaption → Adaptation. Page 10, paragraph 1: Stroking → Striking.

      We thank the reviewer for spotting these typos which we have corrected in the revised version of the manuscript.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      A few thoughts (likely out of the scope of this paper but possibly to consider upon revision):

      R1.Q1: Do bolalipids always have the same headgroup? I don’t recall reading this in the introduction/discussion. R1 and R2 are in Figure 1, but I don’t know whether there are standard types. Could this be expanded upon? Is the model able to take these differences into account?

      We thank the reviewer for raising this important question. Similar to bacteria and eukaryotes, in archaea there is a huge variety in terms of the different head groups that lipids can contain and thus also lipid variety. Most archaeal lipids have head groups that contain either phosphate groups or sugar residues. Typically, archaeal bolalipids are asymmetric and contain a phosphatidyl and a sugar moiety at the two ends of the lipid molecule. Within the membrane the lipid is oriented such that the phosphatidyl moiety points towards the interior of the cell whereas the sugar moiety points towards the outside of the cell as it occupies more space [5].

      In our computational model, however, we consider symmetric bolalipids for the sake of simplicity and to decouple the role of ”connected geometry” from other effects. In principle, we could investigate the effect of lipid asymmetry by increasing the size of one of the lipid head beads. However, this investigation exceeds the scope of the present study and therefore requires future work.

      In the revised version of the manuscript, we now clarify that bolalipids can have different headgroups (p.1 and the caption of Fig. 1):

      “The hydrophilic heads can be composed of different functional groups with phosphatidyl and sugar being the most relevant moieties. For bolalipids the two head groups at either end of the molecule are typically distinct (Fig. 1A right) [5].”

      “The hydrophilic head of a bolalipid can be composed of different functional groups represented by R1 and R2 (right).”

      We also explicitly state that we neglect lipid head group asymmetry for the sake of simplicity (p.4 ):

      “To decouple the effect of the connected geometry of the bolalipids from that of lipid asymmetry, we assume both head beads of a bolalipid to share the same properties.”

      R1.Q2: Is it possible to compare the mesoscale models to either Coarse-grained or even all-atom lipid models? Have simulations previously been performed for bolalipids at those levels of description?

      A few studies have investigated bolalipids membranes in simulations previously. These studies either used all-atom or coarse-grained simulations. However, none of these studies investigated how bolalipids respond to membrane deformations. Therefore, it is currently not possible to directly compare our results to studies in the literature. However, to recapitulate our predictions experimentally is certainly something that could and should be done in the future. As a reply to this reviewer and reviewer 3, we discuss the current state of modelling bolalipid membranes in simulations in the revised version of the manuscript (p.3 ):

      “By contrast, only a few studies have investigated bolalipid membranes applying computational or theoretical tools [24, 25]. Specifically, the pore closure time in bolalipid membranes, and the role of cyclopentane rings for membrane properties has been investigated using all-atom simulations, showing decreased lateral mobility, reduced permeability to water, and increased lipid packing [26–28]. Moreover, using coarse-grained simulations, it was suggested that bolalipid membranes are thicker [29], exhibit a gel-to-liquid phase transition at higher temperature [30], and exhibit a reduced diffusivity [31]. However, little research has been devoted to investigating mechanics and reshaping of bolalipid membranes at the mesoscale despite the obvious importance of this question from evolutionary, biophysics, and biotechnological perspectives and although different membrane physics is expected to manifest.”

      We want to mention, however, that we do compare membrane diffusivity, U-shaped lipid fraction, and bending rigidity to the behaviour and values that have been previously measured in simulations in the discussion section. In general, we find good agreement between our results and previously reported behaviour/values (p.13 ):

      “While flexible bolalipid membranes are liquid under the same conditions as bilayer membranes, we found that stiff bolalipids form membranes that operate in the liquid regime at higher temperatures. These results agree well with previous molecular dynamics simulations that suggested that bolalipid membranes are more ordered and have a reduced diffusivity compared to bilayer membranes [24, 29]. In our simulations, this is due to the fact that completely flexible bolalipids molecules adopt both straight (transmembrane) as well as the U-shaped (loop) conformation with approximately the same frequency. In contrast, stiff bolalipids typically only take on the straight conformation when assembled in a membrane. These results agree with the previous coarse-grained molecular dynamics simulations using the MARTINI force field which showed that the ratio of straight to U-shaped bolalipids increased upon stiffening the linker between the lipid tails [29].

      [...]

      When we determined the bending rigidity of bolalipid membranes by measuring their response to thermal fluctuations, we found that membranes made from flexible bolalipids are only slightly more rigid than bilayer membranes. This result is consistent with previous atomistic simulations, which showed that the membrane rigidity was similar for membranes composed of bilayer lipids and flexible synthetic bolalipids [45].”

      R1.Q3: How would membrane proteins alter the behaviour of bolalipids? Either those integral to the membrane or those binding peripherally?

      The reviewer asks an important question. However, the question is difficult to answer due to its scope and the gaps in the current literature. Important examples of integral or peripheral membrane proteins that alter the behaviour of bolalipids and archaeal bolalipid membranes are involved in cell homeostasis, cell division, membrane trafficking, and lipid synthesis.

      The cells of many archaeal species are enclosed in a paracrystalline protein layer called the Slayer, which is attached to the lipid membrane [4, 55]. The main function of the S-layer is to keep the cell’s shape and to protect it against osmotic stress. Due to the embedding of the S-layer in the membrane at specific locations, it is to be expected that the membrane properties are influenced by the S-layer. Furthermore, archaea execute cell division by locally reshaping the membrane using FtsZ and ESCRT-III proteins [56]. While Asgard archaeal genomes encode proteins with homology to those regulating aspects of eukaryotic membrane remodelling and trafficking [57], they have yet to be observed undergoing a process like endocytosis [58]. In addition, it has been speculated that the proteins that drive the synthesis of two diether lipids into a tetraether lipid are either membrane associated or integral membrane proteins [59].

      However, to the best of our knowledge it is not known how membrane proteins specifically alter the behaviour of bolalipids. Future work will need to be executed to answer this question. Following the advice of reviewer 3 and to keep the introduction concise and focused on bolalipid membranes, we do not mention these observations in the revised manuscript.

      R1.Q4: Is there a mechanism in cells to convert or switch bolalipids from a straight to a u-shaped description? Does this happen spontaneously or are there enzymes responsible for this?

      We thank the reviewer for bringing up this important point. Despite the relevance of the question, little is currently known about the mechanism that make bolalipids transition between a straight and a U-shaped configuration mainly because there is to date no established experimental method.

      Besides our own results, most of what we know comes from coarse-grained molecular dynamics simulations, which showed that bolalipids can spontaneously transition between the straight and U-shaped configuration [29]. In addition, by using comparative genomic analysis, it has been predicted that many archaeal species contain flippases, i.e., membrane proteins that are able, upon the consumption of energy, to transfer (flipflop) bilayer lipids between the two membrane leaflets [43]. Moreover, it has been shown that Halobacterium salinarum (an archaeon with a bilayer lipid membrane) [44] contains scramblases, which are membrane proteins that passively transfer bilayer lipids from one membrane leaflet to the other. It is therefore tempting to speculate that similar proteins might exist for bolalipids which could facilitate the straight to U-shaped transition.

      In addition, it has been reported that vesicles composed of bolalipid membranes can undergo fusion with enveloped influenza viruses [17]. In this context, it has been suggested that the influenza fusion protein hemagglutinin may locally induce U-shaped bolalipids to facilitate membrane fusion. However, all these hints are by far no proof of a mechanism that can drive the straight to U-shaped bolalipid transition, and further work needs to be done to investigate this question in detail.

      In the revised version of the manuscript, we now discuss what is known about potential mechanisms to facilitate the straight to U-shaped transition in the discussion section (p.13 ):

      “While previous coarse-grained simulations predicted that bolalipids spontaneously transition between the straight and U-shaped conformations [29], how this happens in archaeal membranes and whether membrane proteins are involved in this conformational transition needs to be clarified in the future. Experimental studies suggest that archaeal membranes contain flippases and scramblases for the transitioning of bilayer lipids between membrane leaflets [43, 44], raising the possibility that similar proteins could also facilitate conformational transitions in bolalipids. In addition, it has been suggested that the viral fusion protein hemagglutinin could cause a transition from straight to U-shaped bolalipid conformation during the fusion of bolalipid vesicles with influenza viruses [17]. However, future investigation is required.”

      R1.Q5: Ideally, coordinates and any parameter files required to run the molecular simulations should be included for reproducibility.

      We absolutely share the reviewer’s concern with reproducibility and as such have included in the original submission as part of our data availability section a link to a code repository (available at: https://doi.org/10.5281/zenodo.13934991 [51]) that allows initializing and simulating flat membrane patches, with user control of the parameters explored in this paper (𝜔,T<sub>eff</sub>,k<sub>bola</sub>,f<sup>bi</sup>).

      Reviewer #2 (Recommendations for the authors):

      This is a great paper and I congratulate the authors for writing such a fine piece of scholarship. The only nitty-gritty feedback that I have is summarized in the following three points:

      R2.Q1: In the introduction the authors talk about archaea adapting their membrane to retain membrane fluidity. However, homeoviscous adaptation is also fundamental in bacteria and eukaryotes.

      The reviewer is correct, like archaea the membranes of bacteria and eukaryotes must balance between flexibility and stability. Moreover, the cell membranes in all 3 domains of life need to maintain membrane fluidity and provide mobility to the embedded lipids and membrane proteins (homeoviscous adaptation). The general idea is that these organisms change the ratio of different lipids to change membrane properties and thereby optimally adapt to their environments [10]. Importantly, however, there are differences of how homeoviscous adaptation is maintained across the different domains of life. As a reply to this reviewer and reviewer 3, we now discuss the underlying mechanisms in the revised parts of the introduction (p.1 ):

      “Like for bacteria and eukaryotes, archaea must keep their lipid membranes in a fluid state (homeoviscous adaptation). This is important even under extreme environmental conditions, such as hot and cold temperatures, or high and low pH values [7]. Because of this, many archaea adapt to changes in their environment by tuning the lipid composition of their membranes: altering the ratio between bola- and bilayer lipids in their membranes [8, 9] and/or by changing the number of cyclopentane rings in their lipid tails, which are believed to make lipid molecules more rigid [5]. For example, Thermococcus kodakarensis increases its tetraether bolalipid ratio from around 50% to over 80% when the temperature of the environment increases from 60 to 85 C [10]. Along the same lines, the cell membrane of Sulfolobus acidocaldarius, can contain over 90 % of bolalipids with up to 8 cyclopentane rings at 70 C and pH 2.5 [5, 11]. It is worth mentioning that in exceptional cases bacteria also synthesise bolalipids in response to high temperatures [12], highlighting that the study of bolalipid membranes is relevant not only for archaeal biology but also from a general membrane biophysics perspective.”

      R2.Q2: Uncertainties in Gaussian rigidity modulus estimates are not properly reported.

      The large uncertainties in the Gaussian rigidity modulus were due to the fact how they were calculated. In short, is determined in cap folding simulations [41] (SI section 9), by using the measured values of the dimensionless parameter 𝜉, related to the folding probability, the bending modulus 𝜅, the membrane line tension , and the cap radius R. In our case, the main source of uncertainty for determining comes from the uncertainty in the measurement of the bending rigidity 𝜅. To obtain 𝜅, previously, we fitted fluctuation spectra for different seeds and only then averaged the obtained values. In the revised version of the manuscript, we now first pool the fluctuation spectra of the different simulation seeds before we fit all spectra at the same time. This new approach results in smaller uncertainties for the bending rigidity 𝜅 and also the Gaussian rigidity modulus .

      As a consistency check, in addition to the simulations that we previously performed at T<sub>eff</sub> = 1.3, we have repeated the cap folding and line tension simulations at T<sub>eff</sub> = 1.2, resulting in similar values for . In the revised version of the manuscript, we report the newly calculated values and uncertainties for at T<sub>eff</sub>  = 1.2 in the main text (p.8 ):

      “At T<sub>eff</sub>  = 1.2, we obtained = 4.30±0.22kBT and thus a ratio of = 0.89±0.04 for bilayer membranes, similar to what has been reported previously [41]. For flexible bolalipid membranes, we got a slightly smaller value for = 5.04 ± 0.37kBT. Due to the larger bending modulus, however, flexible bolalipid membranes show a significantly smaller ratio = 0.64± 0.04 (k<sub></sub> = 0). At larger temperature (Teff = 1.3), the ratio can be even smaller = 0.45 ± 0.07 (see SI section 9).”

      In addition, we report the values at T<sub>eff</sub> = 1.3 and T<sub>eff</sub> = 1.2 in the SI (p.S15 , Tabl. S4):

      We have also adapted the discussion of the Gaussian bending modulus accordingly (p.13 ):

      “Another marked difference between bilayer and flexible bolalipid membranes is the ratio of the Gaussian rigidity to the bending modulus. Instead of being around 1 as for bilayer membranes [41], it is around 1/2 and therefore only half of that of bilayer lipids.”

      Reviewer #3 (Recommendations for the authors):

      While I think the bulk of the work presented is useful, some of the issues that I raised in my review are indeed major. Without properly addressing them, it is hard to accept the conclusions of the manuscript. I hope the authors can address them by revising their analysis.

      We thank the reviewer for their constructive feedback, which helped us to improve the manuscript. We have addressed all points raised by the reviewer in our detailed point-by-point response to the reviewer (see above). We hope the reviewer will now find it easier to accept our conclusions.

      (1) R. Phillips, J. Kondev, J. Theriot, and H. Garcia, Physical biology of the cell (Garland Science, New York, 2012).

      (2) H. T. McMahon and J. L. Gallop, Membrane curvature and mechanisms of dynamic cell membrane remodelling, Nature 438, 590 (2005).

      (3) S. B. Gould, Membranes and evolution, Curr. Biol. 28, R381 (2018).

      (4) S.-V. Albers and B. H. Meyer, The archaeal cell envelope, Nat. Rev. Microbiol. 9, 414 (2011).

      (5) P. M. Oger and A. Cario, Adaptation of the membrane in Archaea, Biophys. Chem. 183, 42 (2013).

      (6) K. Rastädter, D. J. Wurm, O. Spadiut, and J. Quehenberger, The Cell Membrane of Sulfolobus spp.—Homeoviscous Adaption and Biotechnological Applications, International Journal of Molecular Sciences 21, 3935 (2020).

      (7) P. L.-G. Chong, Archaebacterial bipolar tetraether lipids: Physico-chemical and membrane properties, Chem. Phys. Lipids 163, 253 (2010).

      (8) M. Tourte, P. Schaeffer, V. Grossi, and P. M. Oger, Functionalized Membrane Domains: An Ancestral Feature of Archaea?, Front. Microbiol. 11, 526 (2020).

      (9) Y. H. Kim, G. Leriche, K. Diraviyam, T. Koyanagi, K. Gao, D. Onofrei, J. Patterson, A. Guha, N. Gianneschi, G. P. Holland, M. K. Gilson, M. Mayer, D. Sept, and J. Yang, Entropic effects enable life at extreme temperatures, Sci. Adv. 5, eaaw4783 (2019).

      (10) M. F. Siliakus, J. van der Oost, and S. W. M. Kengen, Adaptations of archaeal and bacterial membranes to variations in temperature, pH and pressure, Extremophiles 21, 651 (2017).

      (11) D. W. Grogan, Phenotypic characterization of the archaebacterial genus sulfolobus: comparison of five wild-type strains, J. Bacteriol. 171, 6710 (1989).

      (12) D. X. Sahonero-Canavesi, M. F. Siliakus, A. Abdala Asbun, M. Koenen, F. von Meijenfeldt, S. Boeren, N. J. Bale, J. C. Engelman, K. Fiege, L. Strack van Schijndel, J. S. Sinninghe Damsté, and L. Villanueva, Disentangling the lipid divide: Identification of key enzymes for the biosynthesis of membrane-spanning and ether lipids in Bacteria, Sci. Adv. 8, eabq8652 (2022).

      (13) M. van Wolferen, A. A. Pulschen, B. Baum, S. Gribaldo, and S.-V. Albers, The cell biology of archaea, Nat. Microbiol. 10.1038/s41564-022-01215-8 (2022).

      (14) U. Bakowsky, U. Rothe, E. Antonopoulos, T. Martini, L. Henkel, and H.-J. Freisleben, Monomolecular organization of the main tetraether lipid from Thermoplasma acidophilum at the water–air interface, Chem. Phys. Lipids 105, 31 (2000).

      (15) C. Jeworrek, F. Evers, M. Erlkamp, S. Grobelny, M. Tolan, P. L.-G. Chong, and R. Winter, Structure and Phase Behavior of Archaeal Lipid Monolayers, Langmuir 27, 13113 (2011).

      (16) D. P. Brownholland, G. S. Longo, A. V. Struts, M. J. Justice, I. Szleifer, H. I. Petrache, M. F. Brown, and D. H. Thompson, Phase Separation in Binary Mixtures of Bipolar and Monopolar Lipid Dispersions Revealed by 2H NMR Spectroscopy, Small Angle X-Ray Scattering, and Molecular Theory, Biophysical Journal 97, 2700 (2009).

      (17) A. Bhattacharya, I. D. Falk, F. R. Moss, T. M. Weiss, K. N. Tran, N. Z. Burns, and S. G. Boxer, Structure–function relationships in pure archaeal bipolar tetraether lipids, Chem. Sci. 15, 14273 (2024).

      (18) V. Vitkova, D. Mitkova, V. Yordanova, P. Pohl, U. Bakowsky, G. Staneva, and O. Batishchev, Elasticity and phase behaviour of biomimetic membrane systems containing tetraether archaeal lipids, Colloids Surf. A Physicochem. Eng. Asp. 601, 124974 (2020).

      (19) E. Chang, Unusual thermal stability of liposomes made from bipolar tetraether lipids, Biochem. Biophys. Res. Commun. 202, 673 (1994).

      (20) O. V. Batishchev, A. S. Alekseeva, D. S. Tretiakova, T. R. Galimzyanov, A. Y. Chernyadyev, N. R. Onishchenko, P. E. Volynsky, and I. A. Boldyrev, Cyclopentane rings in hydrophobic chains of a phospholipid enhance the bilayer stability to electric breakdown, Soft Matter 16, 3216 (2020).

      (21) U. Seifert, Configurations of fluid membranes and vesicles, Adv. Phys. 46, 13 (1997).

      (22) H. Noguchi, Membrane Simulation Models from Nanometer to Micrometer Scale, J. Phys. Soc. Jpn. 78, 041007 (2009).

      (23) F. Frey and T. Idema, More than just a barrier: using physical models to couple membrane shape to cell function, Soft Matter 17, 3533 (2021).

      (24) C. Huguet, S. Fietz, A. Rosell-Melé, X. Daura, and L. Costenaro, Molecular dynamics simulation study of the effect of glycerol dialkyl glycerol tetraether hydroxylation on membrane thermostability, Biochimica et Biophysica Acta (BBA) - Biomembranes 1859, 966 (2017).

      (25) T. R. Galimzyanov, P. I. Kuzmin, P. Pohl, and S. A. Akimov, Elastic deformations of bolalipid membranes, Soft Matter 12, 2357 (2016).

      (26) T. R. Galimzyanov, P. E. Volynsky, and O. V. Batishchev, Continuum elasticity and molecular dynamics of a pore in archaeal bolalipid membranes, Soft Matter 21, 687 (2025).

      (27) A. O. Chugunov, P. E. Volynsky, N. A. Krylov, I. A. Boldyrev, and R. G. Efremov, Liquid but Durable: Molecular Dynamics Simulations Explain the Unique Properties of Archaeal-Like Membranes, Sci. Rep. 4, 7462 (2015).

      (28) L. F. Pineda De Castro, M. Dopson, and R. Friedman, Biological Membranes in Extreme Conditions: Simulations of Anionic Archaeal, PLoS One 11, e0155287 (2016).

      (29) M. Bulacu, X. Périole, and S. J. Marrink, In Silico Design of Robust Bolalipid Membranes, Biomacromolecules 13, 196 (2012).

      (30) C. H. Davis, H. Nie, and N. V. Dokholyan, Insights into thermophilic archaebacterial membrane stability from simplified models of lipid membranes, Phys. Rev. E 75, 051921 (2007).

      (31) S. Dey and J. Saha, Minimal Coarse-Grained Modeling toward Implicit Solvent Simulation of Generic Bolaamphiphiles, J. Phys. Chem. B 124, 2938 (2020).

      (32) I. R. Cooke and M. Deserno, Solvent-free model for self-assembling fluid bilayer membranes: Stabilization of the fluid phase based on broad attractive tail potentials, J. Chem. Phys. 123, 224710 (2005).

      (33) P. L.-G. Chong, U. Ayesa, V. Prakash Daswani, and E. C. Hur, On Physical Properties of Tetraether Lipid Membranes: Effects of Cyclopentane Rings, Archaea 2012, 1 (2012).

      (34) A. P. Thompson, H. M. Aktulga, R. Berger, D. S. Bolintineanu, W. M. Brown, P. S. Crozier, P. J. in ’t Veld, A. Kohlmeyer, S. G. Moore, T. D. Nguyen, R. Shan, M. J. Stevens, J. Tranchida, C. Trott, and S. J. Plimpton, LAMMPS - a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales, Comput. Phys. Commun. 271, 108171 (2022).

      (35) A. Stukowski, Visualization and analysis of atomistic simulation data with ovito–the open visualization tool, Modelling and Simulation in Materials Science and Engineering 18, 015012 (2009).

      (36) E. R. May, A. Narang, and D. I. Kopelevich, Role of molecular tilt in thermal fluctuations of lipid membranes, Physical Review E 76, 021913 (2007).

      (37) W. Helfrich, Elastic Properties of Lipid Bilayers: Theory and Possible Experiments, Z. Naturforsch. C 28, 693 (1973).

      (38) M. Hamm and M. Kozlov, Elastic energy of tilt and bending of fluid membranes, Eur. Phys. J. E 3, 323 (2000).

      (39) M. Deserno, Fluid lipid membranes: From differential geometry to curvature stresses, Chemistry and Physics of Lipids 185, 11 (2015).

      (40) V. A. Harmandaris and M. Deserno, A novel method for measuring the bending rigidity of model lipid membranes by simulating tethers, The Journal of Chemical Physics 125, 204905 (2006).

      (41) M. Hu, J. J. Briguglio, and M. Deserno, Determining the Gaussian Curvature Modulus of Lipid Membranes in Simulations, Biophys. J. 102, 1403 (2012).

      (42) M. Deserno, Elastic deformation of a fluid membrane upon colloid binding, Phys. Rev. E 69, 031903 (2004), arXiv: cond-mat/0303656.

      (43) K. S. Makarova, M. Y. Galperin, and E. V. Koonin, Comparative genomic analysis of evolutionarily conserved but functionally uncharacterized membrane proteins in archaea: Prediction of novel components of secretion, membrane remodeling and glycosylation systems, Biochimie 118, 302 (2015).

      (44) A. Verchère, W.-L. Ou, B. Ploier, T. Morizumi, M. A. Goren, P. Bütikofer, O. P. Ernst, G. Khelashvili, and A. K. Menon, Light-independent phospholipid scramblase activity of bacteriorhodopsin from Halobacterium salinarum, Sci. Rep. 7, 9522 (2017).

      (45) T. B. H. Schroeder, G. Leriche, T. Koyanagi, M. A. Johnson, K. N. Haengel, O. M. Eggenberger, C. L. Wang, Y. H. Kim, K. Diraviyam, D. Sept, J. Yang, and M. Mayer, Effects of lipid tethering in extremophile-inspired membranes on H(+)/OH(-) flux at room temperature, Biophys. J. 110, 2430 (2016).

      (46) R. Xu, A. Dehghan, A.-C. Shi, and J. Zhou, Elastic property of membranes self-assembled from diblock and triblock copolymers, Chem. Phys. Lipids 221, 83 (2019).

      (47) Z. Dogic and S. Fraden, Ordered phases of filamentous viruses, Curr. Opin. Colloid Interface Sci. 11, 47 (2006).

      (48) E. Barry and Z. Dogic, Entropy driven self-assembly of nonamphiphilic colloidal membranes, Proc. Natl. Acad. Sci. U.S.A. 107, 10348 (2010).

      (49) A. J. Balchunas, R. A. Cabanas, M. J. Zakhary, T. Gibaud, S. Fraden, P. Sharma, M. F. Hagan, and Z. Dogic, Equation of state of colloidal membranes, Soft Matter 15, 6791 (2019).

      (50) M. Saracco, P. Schaeffer, M. Tourte, S.-V. Albers, Y. Louis, J. Peters, B. Demé, S. Fontanay, and P. M. Oger, Bilayer-Forming Lipids Enhance Archaeal Monolayer Membrane Stability, Int. J. Mol. Sci. 26, 3045 (2025).

      (51) M. Amaral, archaeal_membranes : code and examples (2024), available at https://doi.org/10.5281/zenodo. 13934991.

      (52) M. F. Ergüder and M. Deserno, Identifying systematic errors in a power spectral analysis of simulated lipid membranes, The Journal of Chemical Physics 154, 214103 (2021).

      (53) J. Genova, N. Ulrih, V. Kralj-Iglič, A. Iglič, and I. Bivas, Bending Elasticity Modulus of Giant Vesicles Composed of Aeropyrum Pernix K1 Archaeal Lipid, Life 5, 1101 (2015).

      (54) M. Amaral, Archaeal Membranes: In Silico Modelling and Design, Ph.D. thesis, Institute of Science and Technology Austria (2024).

      (55) M. Pohlschroder, F. Pfeiffer, S. Schulze, and M. F. A. Halim, Archaeal cell surface biogenesis, FEMS Microbiol. Rev. 42, 694 (2018).

      (56) K. S. Makarova, N. Yutin, S. D. Bell, and E. V. Koonin, Evolution of diverse cell division and vesicle formation systems in Archaea, Nat. Rev. Microbiol. 8, 731 (2010).

      (57) C. W. Stairs and T. J. Ettema, The Archaeal Roots of the Eukaryotic Dynamic Actin Cytoskeleton, Curr. Biol. 30, R521 (2020).

      (58) B. Baum and D. A. Baum, The merger that made us, BMC Biol. 18, 72 (2020).

      (59) Z. Zeng, H. Chen, H. Yang, Y. Chen, W. Yang, X. Feng, H. Pei, and P. V. Welander, Identification of a protein responsible for the synthesis of archaeal membrane-spanning GDGT lipids, Nat. Commun. 13, 1545 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Debeuf et al. introduce a new, fast method for the selection of suitable T cell clones to generate TCR transgenic mice, a method claimed to outperform traditional hybridoma-based approaches. Clone selection is based on the assessment of the expansion and phenotype of cells specific for a known epitope following immune stimulation. The analysis is facilitated by a new software tool for TCR repertoire and function analysis termed DALI. This work also introduces a potentially invaluable TCR transgenic mouse line specific for SARS-CoV-2.

      Strengths:

      The newly introduced method proved successful in the quick generation of a TCR transgenic mouse line. Clone selection is based on more comprehensive phenotypical information than traditional methods, providing the opportunity for a more rational T cell clone selection.

      The study provides a software tool for TCR repertoire analysis and its linkage with function.

      The findings entail general practical implications in the preclinical study of a potentially very broad range of infectious diseases or vaccination.

      A novel SARS-CoV-2 spike-specific TCR transgenic mouse line was generated.

      Weaknesses:

      The authors attempt to compare their novel method with a more conventional approach to developing TCR transgenic mice. In this reviewer's opinion, this comparison appears imperfect in several ways:

      (1) Work presenting the "traditional" method was inadequate to justify the selection of a suitable clone. It is therefore not surprising that it yielded negative results. More evidence would have been necessary to select clone 47 for further development of the TCR transgenic line, especially considering the significant time and investment required to create such a line.

      Based on Supplementary Figure 1A only, we understand the concern of the reviewer. However, the data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      However, a good evaluation of a clone in an in vitro setting does not necessarily translate in optimal functioning of the cells in a biological context. For instance, some clones survive better in an in vitro setting than others or have already a more activated profile before stimulation.

      (2) The comparison is somewhat unfair, because the methods start at different points: while the traditional method was attempted using a pool of peptides whose immunogenicity does not appear to have been established, the new method starts by utilising tetramers to select T cells specific for a well-established epitope.

      Given the costs and time involved, only a single clone could be tested for either method, intrinsically making a proper comparison unfeasible. Even for their new method, the authors' ability to demonstrate that the selected clone is ideal is limited unless they made different clones with varying profiles to show that a particular profile was superior to others.

      In my view, there was no absolute need to compare this method with existing ones, as the proposed method holds intrinsic value.

      We acknowledge the importance of the well-established hydridoma technology and in no way intended to compare these methods head-to-head, nor do not want to question the validity of the classical methods. The reason why we also wanted to show the failed CORSET8 mouse was to highlight the parts of the TCR generating process which could be rationalized. We again want to emphasize that we do not want to compare methods in any way and recognise that we started from two different bases in terms of clone selection (peptide pool stimulation versus tetramer staining). While the tetramer staining that was employed in the generation of CORSET8 mice allowed to enrich the samples for specific responder clones, this enrichment step is not an absolute requirement for the implementation of the presented method or for the successful generation of a TCR Tg mouse model. An alternative approach could be to use the described method to select for activated and expanded clones upon immunisation and test their reactivity in subsequent steps using peptide stimulation before selecting a receptor. In conclusion, we merely wish to present a novel roadmap for others to use for the generation of their TCR Tg mouse to aid in the selection of the most preferable clone for their purposes.

      (3) While having more data to decide on clone selection is certainly beneficial, given the additional cost, it remains unclear whether knowing the expression profiles of different proteins in Figure 2 aids in selecting a candidate. Is a cell expressing more CD69 preferable to a cell expressing less of this marker? Would either have been effective? Are there any transcriptional differences between clonotype 1 and 2 (red colour in Figure 2G) that justify selecting clone 1, or was the decision to select the latter merely based on their different frequency? If all major clones (i.e. by clonotype count) present similar expression profiles, would it have been necessary to know much more about their expression profiles? Would TCR sequencing and an enumeration of clones have sufficed, and been a more cost-effective approach?

      The method we present in the paper serves as a proof-of-concept, to be adapted to the researcher’s own needs. We agree with the reviewer that for our intentions with the CORSET8 mice, TCRseq in combination with an enumeration of the clones could also have sufficed and would lower the cost of sequencing. However, we wish to present a roadmap for others to use for the generation of their TCR Tg mouse. Important in this, is that the cellular phenotype, and activation state can be taken into consideration, which might for some projects be essential.  

      Nonetheless, we do see clear interclonal differences regarding the expression of “activation” genes, where clone 1 is clearly one of the well activated and interferon producing clones (as shown in Author response image 1). As such, researchers could expand these types of analysis to probe for specific phenotypes of characteristics.

      Author response image 1.

      (4) Lastly, it appears that several of the experiments presented were conducted only once. This information should have been explicitly stated in the figure legends.

      To control for interexperimental variation, every experiment represented in the manuscript has been performed at least two times. We have added the additional information regarding the experimental repetitions and groups in the figure legends.

      Reviewer #2 (Public Review):

      Summary:

      The authors seek to use single-cell sequencing approaches to identify TCRs specific for the SARS CoV2 spike protein, select a candidate TCR for cloning, and use it to construct a TCR transgenic mouse. The argument is that this process is less cumbersome than the classical approach, which involves the identification of antigen-reactive T cells in vitro and the construction of T cell hybridomas prior to TCR cloning. TCRs identified by single-cell sequencing that are already paired to transcriptomic data would more rapidly identify TCRs that are likely to contribute to a functional response. The authors successfully identify TCRs that have expanded in response to SARS CoV2 spike protein immunization, bind to MHC tetramers, and express genes associated with functional response. They then select a TCR for cloning and construction of a transgenic mouse in order to test the response of resulting T cells in vivo following immunization with spike protein of coronavirus infection.

      Strengths:

      (1) The study provides proof of principle for the identification and characterization of TCRs based on single-cell sequencing data.

      (2) The authors employ a recently developed software tool (DALI) that assists in linking transcriptomic data to individual clones.

      (3) The authors successfully generate a TCR transgenic animal derived from the most promising T cell clone (CORSET8) using the TCR sequencing approach.

      (4) The authors provide initial evidence that CORSET8 T cells undergo activation and proliferation in vivo in response to immunization or infection.

      (5) Procedures are well-described and readily reproducible.

      Weaknesses:

      (1) The purpose of presenting a failed attempt to generate TCR transgenic mice using a traditional TCR hybridoma method is unclear. The reasons for the failure are uncertain, and the inclusion of this data does not really provide information on the likely success rate of the hybridoma vs single cell approach for TCR identification, as only a single example is provided for either.

      We refer to comments 2 and 3 of reviewer 1 for an answer to this point.

      (2) There is little information provided regarding the functional differentiation of the CORSET8 T cells following challenge in vivo, including expression of molecules associated with effector function, cytokine production, killing activity, and formation of memory. The study would be strengthened by some evidence that CORSET8 T cells are successfully recapitulating the functional features of the endogenous immune response (beyond simply proliferating and expressing CD44). This information is important to evaluate whether the presented sequencing-based identification and selection of TCRs is likely to result in T-cell responses that replicate the criteria for selecting the TCR in the first place.

      We agree with the reviewer that the data in the initial manuscript included only a limited in vivo functional validation of the CORSET8 T cells. Therefore, we extended these in vivo readouts and measured IFN-g production, CD69, T-bet expression (as measure for activation) and Ki-67 expression (as alternative readout than CTV for proliferation). In the single cell data, we saw that these markers were more pronounced in the selected clone compared to other clones. We could confirm these findings in vivo, and found a stronger induction of IFN-g, CD69, T-bet and Ki-67 in CORSET8 T cells compared to endogenous CD45.2 cells and even Spike-Tetramer+ CD45.2 endogenous cells. We added these data in Figure 4.

      (3) While I find the argument reasonable that the approach presented here has a lot of likely advantages over traditional approaches for generating TCR transgenic animals, the use of TCR sequencing data to identify TCRs for study in a variety of areas, including cancer immunotherapy and autoimmunity, is in broad use. While much of this work opts for alternative methods of TCR expression in primary T cells (i.e. CRISPR or retroviral approaches), the process of generating a TCR transgenic mouse from a cloned TCR is not in itself novel. It would be helpful if the authors could provide a more extensive discussion explaining the novelty of their approach for TCR identification in comparison to other more modern approaches, rather than only hybridoma generation.

      By integrating the recent technological advances in single cell sequencing into the generation of TCR Tg mice, possibilities arise to rationalize clone selection regarding clonal size, lineage/phenotype and functional characteristics. Often, the selection process based on hybridoma selection yields multiple epitope specific clones that upregulate CD69 or IL-2, and only minimal functional and phenotypic parameters are checked before prioritizing one clone to proceed with. In our experience, transgenic clones selected in this way sometimes render TCR clones unable to compete with endogenous polyclonal T clones in vivo. Taken all these caveats into account, the novelty we present here is that the researcher is fully able to select clones based on several layers of information without the need for extensive or repeated screening. Moreover, the selection of the TCR Tg clone can be done via the interactive and easily interpretable DALI tool. Owing to the browser-based interactive GUI, immunologists having limited coding experience can effectively analyse their complex datasets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Regarding Supplementary Figure 1A was the experiment conducted more than once? Clone 47 seems minimally superior to the other clones. Incorporating a positive control, such as the response of the OT-I hybridoma to SIINFEKL, could have provided a benchmark to gauge the strength of the observed responses.

      Also, what was the concentration of the peptide used to restimulate the T cells in vitro? High peptide concentrations can lead to non-specific responses. Ideally, a titration should have been performed, perhaps in a subsequent experiment that only tested those clones that responded well initially. Given the resources required to create and maintain a transgenic mouse line, proceeding with the chosen clone based on the data presented seems to carry considerable risk.

      The experiment has been performed three times. The data presented in Supplementary Figure 1A is collected during the first rough screening of clones where only the production of IL-2 and IFN-y was measured as a readout for activation. Thereafter, a large selection of responsive clones was further grown and co-cultured with a dose-titration of the antigenic peptide pool. In this second co-culture, also flow cytometry readouts are included such as CD69 expression (as shown in Supplementary Figure 1B). Finally, a narrower selection of responder clones was co-cultured with the different individual peptides to unravel the specificity of the TCR of the clone. In conclusion, the clone was tested at least three times in three distinct set-ups with multiple different readouts.

      In Supplementary Figure 1C, no response to stimulation was detected. Ideally, this figure should have included a positive control, such as PMA/Ionomycin or aCD3/CD28 stimulation.

      We agree with the reviewer that this experiment should have included a positive control to validate the non-specific responsiveness of the clone and the technical feasibility of the experiment. Unfortunately, the initial CORSET8 line is frozen and is thus not easily available to repeat the experiment.

      Can the authors clarify their gating strategy in the legend of In Supplementary Figure 1D?

      Plotted cells are non-debris > single cells > viable cells > CD45+. We have added the information to the legend of Supplementary Figure 1D.

      In Figure 2, the figure legend should provide more detail on which cells were sorted for the single-cell RNA sequencing analysis. The materials and methods section explains that cells were stained for CD44. Were activated cells then sorted (either tetramer-positive or -negative), plus naïve CD8 T cells from a naïve mouse?

      Supplementary Figure 2 contains the detailed gating strategy during the sort for the single cell experiment. We have added additional red gates to the plots to clarify which samples were sent for sequencing. This has been adapted in the figure legends of both Figure 2 and Supplementary Figure 2. 

      In Figure 3, Rag1 sufficient transgenic mice display similar numbers of CD4 and CD8 T cells as WT mice in the spleen. Typically, transgenic mice present skewed frequencies of T cells towards the type generated (CD8 in this case), which the authors only found in the thymus of CORSET8 mice. Could this be discussed?

      The comment of the reviewer is valid as there is indeed a skewing towards CD8 T cells in the thymi of the CORSET8 mice. We looked back into the data of the experiments and noticed that poor resolution of some markers might have resulted in improper results. We have repeated this and added another T cell marker (TCRbeta) next to the already included CD3e marker. By including both markers, we were able to show that also in spleen the skewing towards the CD8 T cell phenotype is present.

      How many repetitions were performed for the experiments in Figures 3D and 3E? How many mice were analyzed for Figure 3E? Please provide this information in the figure legend. Also, include a proper quantification and statistical analysis of the data shown.

      New quantification graphs with statistical analysis have been added to Figure 3E. The accompanying figure legend has been adapted. The co-culture displayed in Figure 3D is a representative experiment of two repetitions.

      Figure 4C includes 3-4 mice per group. This experiment should have been replicated, and this information should be indicated in the figure legend.

      We apologise for omitting this data in the figure legend. The experiment presented in Figure 4A-C has been repeated twice, yielding results following the same trend. We were unable to pool the data as two different proliferation dyes were used in the separate experiments (CFSE and CTV). Furthermore, in the in vivo BSL3 experiments represented in figure 4E-H, we always took along the Spike/CpG-group as positive control. We have added the additional information regarding the experimental repetitions and groups in the figure legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      We thank Reviewer 1 for their helpful comments and hope that the changes made to the revised manuscript have addressed their points.

      This study presents a novel application of the inverted encoding (i.e., decoding) approach to detect the correlates of crossmodal integration in the human EEG (electrophysiological) signal. The method is successfully applied to data from a group of 41 participants, performing a spatial localization task on auditory, visual, and audiovisual events. The analyses clearly show a behavioural superiority for audio-visual localization. Like previous studies, the results when using traditional univariate ERP analyses were inconclusive, showing once more the need for alternative, more sophisticated approaches. Instead, the principal approach of this study, harnessing the multivariate nature of the signal, captured clear signs of super-additive responses, considered by many as the hallmark of multisensory integration. Unfortunately, the manuscript lacks many important details in the descriptions of the methodology and analytical pipeline. Although some of these details can eventually be retrieved from the scripts that accompany this paper, the main text should be self-contained and sufficient to gain a clear understanding of what was done. (A list of some of these is included in the comments to the authors). Nevertheless, I believe the main weakness of this work is that the positive results obtained and reported in the results section are conditioned upon eye movements. When artifacts due to eye movements are removed, then the outcomes are no longer significant. 

      Therefore, whether the authors finally achieved the aims and showed that this method of analysis is truly a reliable way to assess crossmodal integration, does not stand on firm ground. The worst-case scenario is that the results are entirely accounted for by patterns of eye movements in the different conditions. In the best-case scenario, the method might truly work, but further experiments (and/or analyses) would be required to confirm the claims in a conclusive fashion.

      One first step toward this goal would be, perhaps, to facilitate the understanding of results in context by reporting both the uncorrected and corrected analyses in the main results section. Second, one could try to support the argument given in the discussion, pointing out the origin of the super-additive effects in posterior electrode sites, by also modelling frontal electrode clusters and showing they aren't informative as to the effect of interest.

      We performed several additional analyses to address concerns that our main result was caused by different eye movement patterns between conditions. We re-ran our key analyses using activity exclusively from frontal electrodes, which revealed poorer decoding performance than that from posterior electrodes. If eye movements were driving the non-linear enhancement in the audiovisual condition, we would expect stronger decoding using sensors closer to the source, i.e., the extraocular muscles. We also computed the correlations between average eye position and stimulus position for each condition to evaluate whether participants made larger eye movements in the audiovisual condition, which might have contributed to better decoding results. Though we did find evidence for eye movements toward stimuli, the degree of movement did not significantly differ between conditions.

      Furthermore, we note that the analysis using a stricter eye movement criterion, acknowledged in the Discussion section of the original manuscript, resulted in very similar results to the original analysis. There was significantly better decoding in the AV condition (as measured by d') than the MLE prediction, but this difference did not survive cluster correction. The most likely explanation for this is that the strict eye movement criterion combined with our conservative measure of (mass-based) cluster correction led to reduced power to detect true differences between conditions. Taken together with the additional analyses described in the revised manuscript and supplementary materials, the results show that eye movements are unlikely to account for differences between the multisensory and unisensory conditions. Instead, our decoding results likely reflect nonlinear neural integration between audio and visual sensory information.

      “Any experimental design that varies stimulus location needs to consider the potential contribution of eye movements. We computed correlations between participants’ average eye position and each stimulus position between the three sensory conditions (auditory, visual and audiovisual; Figure S1) and found evidence that participants made eye movements toward stimuli. A re-analysis of the data with a very strict eye-movement criterion (i.e., removing trials with eye movements >1.875º) revealed that the super-additive enhancement in decoding accuracy no longer survived cluster correction, suggesting that our results may be impacted by the consistent motor activity of saccades towards presented stimuli. Further investigation, however, suggests this is unlikely. Though the correlations were significantly different from 0, they were not significantly different from each other. If consistent saccades to audiovisual stimuli were responsible for the nonlinear multisensory benefit we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Interestingly, eye movements corresponded more to stimulus location in the auditory and audiovisual conditions than in the visual condition, indicating that it was the presence of a sound, rather than a visual stimulus, that drove small eye movements. This could indicate that participants inadvertently moved their eyes when localising the origin of sounds. We also re-ran our analyses using the activity measured from the frontal electrodes alone (Figure S2). If the source of the nonlinear decoding accuracy in the audiovisual condition was due to muscular activity produced by eye movements, there should be better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy of stimulus location from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 3).” 

      The univariate ERP analyses an outdated contrast, AV <> A + V to capture multisensory integration. A number of authors have pointed out the potential problem of double baseline subtraction when using this contrast, and have recommended a number of solutions, experimental and analytical. See for example: [1] and [2]. 

      (1) Teder-Salejarvi, W. A., McDonald, J. J., Di Russo, F., & Hillyard, S. A. (2002). Cognitive Brain Research, 14, 106-114. 

      (2) Talsma, D., & Woldorff, M. G. (2005). Journal of cognitive neuroscience, 17(7), 1098-1114.

      We thank the reviewer for raising this point. Comparing ERPs across different sensory conditions requires careful analytic choices to discern genuine sensory interactions within the signal. The AV <> (A +V) contrast has often been used to detect multisensory integration, though any non-signal related activity (i.e. anticipatory waves; Taslma & Woldorff, 2005) or pre-processing manipulation (e.g. baseline subtraction; Teder-Sälejärvi et al., 2002) will be doubled in (A + V) but not in AV. Critically, we did not apply a baseline correction during preprocessing and thus our results are not at risk of double-baseline subtraction in (A + V). Additionally, we temporally jittered the presentation of our stimuli to mitigate the potential influence of consistent overlapping ERP waves (Talsma & Woldorff, 2005). 

      The results section should provide the neurometric curve/s used to extract the slopes of the sensitivity plot (Figure 2B). 

      We thank the reviewer for raising this point of clarification. The sensitivity plots for Figures 2B and 2C were extracted from the behavioural performance of the behavioural and EEG tasks, respectively. The sensitivity plot for Figure 2B was extracted from individual psychometric curves, whereas the d’ values for Figure 2C were calculated from the behavioural data for the EEG task. This information has been clarified in the manuscript.

      “Figure 1. Behavioural performance is improved for audiovisual stimuli. A) Average accuracy of responses across participants in the behavioural session at each stimulus location for each stimulus condition, fitted to a psychometric curve. Steeper curves indicate greater sensitivity in identifying stimulus location. B) Average sensitivity across participants in the behavioural task, estimated from psychometric curves, for each stimulus condition. The red cross indicates estimated performance assuming optimal (MLE) integration of unisensory cues. C) Average behavioural sensitivity across participants in the EEG session for each stimulus condition. Error bars indicate ±1 SEM.”

      The encoding model was fitted for each electrode individually; I wonder if important information contained as combinations of (individually non-significant) electrodes was then lost in this process and if the authors consider that this is relevant. 

      Although the encoding model was fitted for each electrode individually for the topographic maps (Figure 4B), in all other analyses the encoding model was fitted across a selection of electrodes (see final inset of Figure 3). As this electrode set was used for all other neural analyses, our model would allow for the detection of important information contained in the neural patterns across electrodes. This information has been clarified in the manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2). As the model was fitted for multiple electrodes, subtle patterns of neural information contained within combinations of sensors could be detected.”

      Neurobehavioral correlations could benefit from outlier rejection and the use of robust correlation statistics. 

      We thank the reviewer for raising this issue. Note, however, that the correlations we report are resistant to the influence of outliers because we used Spearman’s rho1 (as opposed to Pearson’s). This information has been communicated in the manuscript.

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “Neurobehavioural correlations. As behavioural and neural data violated assumptions of normality, we calculated rank-order correlations (Spearman’s rho) between the average decoding sensitivity for each participant from 150-250 ms poststimulus onset and behavioural performance on the EEG task. As Spearman’s rho is resistant to outliers (Wilcox, 2016), we did not perform outlier rejection.”

      “Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069”

      Many details that are important for the reader to evaluate the evidence and to understand the methods and analyses aren't given; this is a non-exhaustive list:  

      We thank the reviewer for highlighting these missing details. We have updated the manuscript where necessary to ensure the methods and analyses are fully detailed and replicable.

      - specific parameters of the stimuli and performance levels. Just saying "similarly difficult" or "marginally higher volume" is not enough to understand exactly what was done.  

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.”

      - where are stimulus parameters adjusted individually or as a group? Which method was followed?  

      To clarify, stimulus parameters (frequency, size, luminance, volume, location, etc.) were manipulated throughout pilot testing only. Parameters were adjusted to achieve similar pilot behavioural results between the auditory and visual conditions. For the experiment proper, parameters remained constant for both tasks and were the same for all participants.

      “During pilot testing, stimulus features (size, luminance, volume, frequency etc.) were manipulated to make visual and auditory stimuli similarly difficult to spatially localize. These values were held constant in the main experiment.”

      - specify which response buttons were used.

      “Participants were presented with two consecutive stimuli and tasked with indicating, via button press, whether the first (‘1’ number-pad key) or second (‘2’ number-pad key) interval contained the more leftward stimulus.”

      “At the end of each sequence, participants were tasked with indicating, via button press, whether more presentations appeared on the right (‘right’ arrow key) or the left (‘left’ arrow key) of the display.”

      - no information is given as to how many trials per condition remained on average, for analysis.  

      The average number of remaining trials per condition after eye-movement analysis is now included in the Methods section of the revised manuscript.

      “We removed trials with substantial eye movements (>3.75 away from fixation) from the analyses. After the removal of eye movements, on average 2365 (SD \= 56.94), 2346 (SD \= 152.87) and 2350 (SD \= 132.47) trials remained for auditory, visual and audiovisual conditions, respectively, from the original 2400 per condition.”

      - no information is given on the specifics of participant exclusion criteria. (even if the attrition rate was surprisingly high, for such an easy task).  

      The behavioural session also served as a screening task. Although the task instructions were straightforward, perceptual discrimination was not easy due to the ambiguity of the stimuli. Auditory localization is not very precise, and the visual stimuli were brief, dim, and diffuse. The behavioural results reflect the difficulty of the task. Attrition rate was high as participants who scored below 60% correct in any condition were deemed unable to accurately perform the task, were not invited to complete the subsequent EEG session, and omitted from the analyses. We have included the specific criteria in the manuscript.

      “Participants were first required to complete a behavioural session with above 60% accuracy in all conditions to qualify for the EEG session (see Behavioural session for details).”

      - EEG pre-processing: what filter was used? How was artifact rejection done? (no parameters are reported); How were bad channels interpolated?  

      We used a 0.25 Hz high-pass filter to remove baseline drifts, but no low-pass filter. In line with recent studies on the undesirable influence of EEG preprocessing on ERPs1, we opted to avoid channel interpolation and artifact rejection. This was erroneously reported in the manuscript and has now been clarified. For the sake of clarity, here we demonstrate that a reanalysis of data using channel interpolation and artifact rejection returned the same pattern of results. 

      (1) Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13, 2372. https://doi.org/10.1038/s41598-023-27528-0

      - specific electrode locations must be given or shown in a plot (just "primarily represented in posterior electrodes" is not sufficiently informative).  

      A diagram of the electrodes used in all analyses is included within Figure 3, and we have drawn readers’ attention to this in the revised manuscript.

      “Thus, for all subsequent analyses we only included signals from the central-temporal, parietal-occipital, occipital and inion sensors for computing the inverse model (see final inset of Figure 2).” 

      - ERP analysis: which channels were used? What is the specific cluster correction method?

      We used a conservative mass-based cluster correction from Pernet et al. (2015) - this information has been clarified in the manuscript.

      “A conservative mass-based cluster correction was applied to account for spurious differences across time (Pernet et al., 2015).” 

      “Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85-93. https://doi.org/https://doi.org/10.1016/j.jneumeth.2014.08.003” 

      - results: descriptive stats on performance must be given (instead of saying "participants performed well").  

      The mean and standard deviation of participants’ performance for each condition in the behavioural and EEG experiments are now explicitly mentioned in the manuscript.

      “A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly higher sensitivity for the audiovisual stimuli (M = .04, SD = .02) than for the auditory stimuli alone (M = .03, SD = .01; Z = -3.09, p = .002), and than for the visual stimuli alone (M = .02, SD = .01; Z = -5.28, p = 1.288e-7; Figure 1B). Sensitivity for auditory stimuli was also significantly higher than sensitivity for visual stimuli (Z = 2.02, p = .044).” 

      “We found a similar pattern of results to those in the behavioural session; sensitivity for audiovisual stimuli (M = .85, SD = .33) was significantly higher than for auditory (M = .69, SD = .41; Z = -2.27, p = .023) and visual stimuli alone (M = .61, SD = .29; Z = -3.52, p = 4.345e-4), but not significantly different from the MLE prediction (Z = -1.07, p = .285).” 

      - sensitivity in the behavioural and EEG sessions is said to be different, but no comparison is given. It is not even the same stimulus set across the two tasks...  

      This relationship was noted as a potential explanation for the higher sensitivities obtained in the EEG task, and was not intended to stand up to statistical scrutiny. We agree it makes little sense to compare statistically between the EEG and behavioural results as they were obtained from different tasks. We would like to clarify, however, that the stimuli used in the two tasks were the same, with the exception that in the EEG task the stimuli were presented from 5 locations versus 8 in the behavioural task. To avoid potential confusion, we have removed the offending sentence from the manuscript:

      Reviewer 2:

      Their measure of neural responses is derived from the decoder responses, and this takes account of the reliability of the sensory representations - the d' statistics - which is an excellent thing. It also means if I understand their analysis correctly (it could bear clarifying - see below), that they can generate from it a prediction of the performance expected if an optimal decision is made combining the neural signals from the individual modalities. I believe this is the familiar root sum of squares d' calculation (or very similar). Their decoding of the audiovisual responses comfortably exceeds this prediction and forms part of the evidence for their claims. 

      Yet, superadditivity - including that in evidence in the principle of inverse effectiveness more typically quantifies the excess over the sum of proportions correct in each modality. Their MLE d' statistic can already predict this form of superadditivity. Therefore, the superadditivity they report here is not the same form of superadditivity that is usually referred to in behavioural studies. It is in fact a stiffer definition. What their analysis tests is that decoding performance exceeds what would be expected from an optimally weighted linear integration of the unisensory information. As this is not the common definition it is difficult to relate to behavioral superadditivity reported in much literature (of percentage correct). This distinction is not at all clear from the manuscript. 

      But the real puzzle is here: The behavioural data or this task do not exceed the optimal statistical decision predicted by signal detection theory (the MLE d'). Yet, the EEG data would suggest that the neural processing is exceeding it. So why, if the neural processing is there to yield better performance is it not reflected in the behaviour? I cannot explain this, but it strikes me that the behaviour and neural signals are for some reason not reflecting the same processing. 

      Be explicit and discuss this mismatch they observe between behaviour and neural responses. 

      Thank you, we agree that it is worth expanding on the observed disconnect between MSI in behaviour and neural signals. We have included an additional paragraph in the Discussion of the revised manuscript. Despite the mismatch, we believe the behavioural and neural responses still reflect the same underlying processing, but at different levels of sensitivity. The behavioural result likely reflects a coarse down-sampling of the precision in location representation, and thus less likely to reflect subtle MSI enhancements.

      “An interesting aspect of our results is the apparent mismatch between the behavioural and neural responses. While the behavioural results meet the optimal statistical threshold predicted by MLE, the decoding analyses suggest that the neural response exceeds it. Though non-linear neural responses and statistically optimal behavioural responses are reliable phenomena in multisensory integration (Alais & Burr, 2004; Ernst & Banks, 2002; Stanford & Stein, 2007), the question remains – if neural super-additivity exists to improve behavioural performance, why is it not reflected in behavioural responses? A possible explanation for this neurobehavioural discrepancy is the large difference in timing between sensory processing and behavioural responses. A motor response would typically occur some time after the neural response to a sensory stimulus (e.g., 70-200 ms), with subsequent neural processes between perception and action that introduce noise (Heekeren et al., 2008) and may obscure super-additive perceptual sensitivity. In the current experiment, participants reported either the distribution of 20 serially presented stimuli (EEG session) or compared the positions of two stimuli (behavioural session), whereas the decoder attempts to recover the location of every presented stimulus. While stimulus location could be represented with higher fidelity in multisensory relative to unisensory conditions, this would not necessarily result in better performance on a binary behavioural task in which multiple temporally separated stimuli are compared. One must also consider the inherent differences in how super-additivity is measured at the neural and behavioural levels. Neural super-additivity should manifest in responses to each individual stimulus. In contrast, behavioural super-additivity is often reported as proportion correct, which can only emerge between conditions after being averaged across multiple trials. The former is a biological phenomenon, while the latter is an analytical construct. In our experiment, we recorded neural responses for every presentation of a stimulus, but behavioural responses were only obtained after multiple stimulus presentations. Thus, the failure to find super-additivity in behavioural responses might be due to their operationalisation, with between-condition comparisons lacking sufficient sensitivity to detect super-additive sensory improvements. Future work should focus on experimental designs that can reveal super-additive responses in behaviour.”

      Re-work the introduction to explain more clearly the relationship between the behavioural superadditivities they review, the MLE model, and the superadditivity it actually tests. 

      We agree it is worth discussing how super-additivity is operationalised across neural and behavioural measures. However, we do not believe the behavioural studies we reviewed claimed super-additive behavioural enhancements. While MLE is often used as a behavioural marker of successful integration, it is not necessarily used as evidence for super-additivity within the behavioural response, as it relies on linear operations. 

      “It is important to consider the differences in how super-additivity is classified between neural and behavioural measures. At the level of single neurons, superadditivity is defined as a non-linear response enhancement, with the multisensory response exceeding the sum of the unisensory responses. In behaviour, meanwhile, it has been observed that the performance improvement from combining two senses is close to what is expected from optimal integration of information across the senses (Alais & Burr, 2004; Stanford & Stein, 2007). Critically, behavioural enhancement of this kind does not require non-linearity in the neural response, but can arise from a reliability-weighted average of sensory information. In short, behavioural performance that conforms to MLE is not necessarily indicative of neural super-additivity, and the MLE model can be considered a linear baseline for multisensory integration.”

      Regarding the auditory stimulus, this reviewer notes that interaural time differences are unlikely to survive free field presentation.

      Despite the free field presentation, in both the pilot test and the study proper participants were able to localize auditory stimuli significantly above chance. 

      "However, other studies have found super-additive enhancements to the amplitude of sensory event-related potentials (ERPs) for audiovisual stimuli (Molholm et al., 2002; Talsma et al., 2007), especially when considering the influence of stimulus intensity (Senkowski et al., 2011)." - this makes it obvious that there are some studies which show superadditivity. It would have been good to provide a little more depth here - as to what distinguished those studies that reported positive effects from those that did not.

      We have provided further detail on how super-additivity appears to manifest in neural measures.

      “In EEG, meanwhile, the evoked response to an audiovisual stimulus typically conforms to a sub-additive principle (Cappe et al., 2010; Fort et al., 2002; Giard & Peronnet, 1999; Murray et al., 2016; Puce et al., 2007; Stekelenburg & Vroomen, 2007; Teder- Sälejärvi et al., 2002; Vroomen & Stekelenburg, 2010). However, when the principle of inverse effectiveness is considered and relatively weak stimuli are presented together, there has been some evidence for super-additive responses (Senkowski et al., 2011).”

      “While behavioural outcomes for multisensory stimuli can be predicted by MLE, and single neuron responses follow the principles of inverse effectiveness and super- additivity, among others (Rideaux et al., 2021), how audiovisual super-additivity manifests within populations of neurons is comparatively unclear given the mixed findings from relevant fMRI and EEG studies. This uncertainty may be due to biophysical limitations of human neuroimaging techniques, but it may also be related to the analytic approaches used to study these recordings. For instance, superadditive responses to audiovisual stimuli in EEG studies are often reported from very small electrode clusters (Molholm et al., 2002; Senkowski et al., 2011; Talsma et al., 2007), suggesting that neural super-additivity in humans may be highly specific. However, information encoded by the brain can be represented as increased activity in some areas, accompanied by decreased activity in others, so simplifying complex neural responses to the average rise and fall of activity in specific sensors may obscure relevant multivariate patterns of activity evoked by a stimulus.”

      P9. "(25-75 W, 6 Ω)." This is not important, but it is a strange way to cite the power handling of a loudspeaker. 

      “The loudspeakers had a power handling capacity of 25-75 W and a nominal impedance of 6 Ω.” 

      I am struggling to understand the auditory stimulus: 

      "Auditory stimuli were 100 ms clicks". Is this a 100-ms long train of clicks? A single pulse which is 100ms long would not sound like a click, but two clicks once filtered by the loudspeaker. Perhaps they mean 100us. 

      "..with a flat 850 Hz tone embedded within a decay envelope". Does this mean the tone is gated - i.e. turns on and off slowly? Or is it constant?

      We thank the reviewer for catching this. ‘Click’ may not be the most apt way of defining the auditory stimulus. It was a 100 ms square wave tone with decay, i.e., with an onset at maximal volume before fading gradually. Given that the length of the stimulus was 100 ms, the decay occurs quickly and provides a more ‘click-like’ percept than a pure tone. We have provided a representation of the sound below for further clarification. This represents the amplitude from the L and R speakers for maximally-left and maximally-right stimuli. We have added this clarification in the revised manuscript. 

      Author response image 1.

      “Auditory stimuli were 100 ms, 850 Hz tones with a decay function (sample rate = 44, 100 Hz; volume = 60 dBA SPL, as measured at the ears).”

      P10. "Stimulus modality was either auditory, visual, or audiovisual. Trials were blocked with short (~2 min) breaks between conditions".

      Presumably the blocks were randomised across participants.

      Condition order was not randomised across participants, but counterbalanced. This has been clarified in the manuscript.

      “Stimulus modality was auditory, visual or audiovisual, presented in separate blocks with short breaks (~2 min) between conditions (see Figure 6A for an example trial). The order of conditions was counterbalanced across participants.” 

      P15. Feels like there is a step not described here: "The d' of the auditory and visual conditions can be used to estimate the predicted 'optimal' sensitivity of audiovisual signals as calculated through MLE." Do they mean sqrt[ (d'A)^2 + (d'V)^2] ? If it is so simple then it may as well be made explicit here. A quick calculation from eyeballing Figures 2B and 2C suggests this is the case.

      We thank the reviewer for raising this point of clarification. Yes, the ‘optimal’ audiovisual sensitivity was calculated as the hypotenuse of the auditory and visual sensitivities. This calculation has been made explicit in the revised manuscript.

      The d’ from the auditory and visual conditions can be used to estimate the predicted ‘optimal’ sensitivity to audiovisual signals as calculated through the following formula:

      "The perceived source location of auditory stimuli was manipulated via changes to interaural intensity and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992)." The stimuli were delivered by a pair of loudspeakers, and the incident sound at each ear would be a product of both speakers. And - if there were a time delay between the two speakers, then both ears could potentially receive separate pulses one after the other at different delays. Did they record this audio stimulus with manikin? If not, it would be very difficult to know what it was at the ears. I don't doubt that if they altered the relative volume of the loudspeakers then some directionality would be perceived but I cannot see how the interaural level and timing differences could be matched - as if the sound were from a single source. I doubt that this invalidates their results, but to present this as if it provided matched spatial and timing cues is wrong, and I cannot work out how they can attribute an azimuthal location to the sound. For replication purposes, it would be useful to know how far apart the loudspeakers were and what the timing and level differences actually were.

      The behavioural tasks each had evenly distributed ‘source locations’ on the horizontal azimuth of the computer display (8 for the behavioural session, 5 for the EEG session). We manipulated the perceived location of auditory stimuli through interaural time delays and interaural level differences. By first measuring the forward (z) and horizontal (x) distance of each source location to each ear, the method worked by calculating what the time-course of a sound wave should be at the location of the ear given the sound wave at the source. Then, for each source location, we can calculate the time delay between speakers given the vectors of x and z, the speed of sound and the width of the head.  As the intensity of sound drops inversely with the square of the distance, we can divide the sound wave by the distance for each source location to provide the interaural level difference. Though we did not record the auditory stimulus with a manikin, our behavioural analyses show that participants were able to detect the directions of auditory stimuli from our manipulations, even to a degree that significantly exceeded the localisation accuracy for visual stimuli (for the behavioural session task). This information has been clarified in the manuscript.

      “Auditory stimuli were played through two loudspeakers placed either side of the display (80 cm apart for the behavioural session, 58 cm apart for the EEG session).” 

      “The perceived source location of auditory stimuli was manipulated via changes to interaural level and timing (Whitworth & Jeffress, 1961; Wightman & Kistler, 1992). The precise timing of when each speaker delivered an auditory stimulus was calculated from the following formula:

      where x and z are the horizontal and forward distances in metres between the ears and the source of the sound on the display, respectively, r is the head radius, and s is the speed of sound. We used a constant approximate head radius of 8 cm for all participants. r was added to x for the left speaker and subtracted for the right speaker to produce the interaural time difference. For ±15° source locations, interaural timing difference was 1.7 ms. To simulate the decrease in sound intensity as a function of distance, we calculated interaural level differences for the left and right speakers by dividing the sounds by the left and right distance vectors. Finally, we resampled the sound using linear interpolation based on the calculations of the interaural level and timing differences. This process was used to calculate the soundwaves played by the left and right speakers for each of the possible stimulus locations on the display. The maximum interaural level difference between speakers was 0.14 A for ±15° auditory locations, and 0.07 A for ±7.5°.

      I am confused about this statement: "A quantification of the behavioural sensitivity (i.e., steepness of the curves) revealed significantly greater sensitivity for the audiovisual stimuli than for the auditory stimuli alone (Z = -3.09, p = .002)," It is not clear from the methods how they attributed sound source angle to the sounds. Conceivably they know the angle of the loudspeakers, and this would provide an outer bound on the perceived location of the sound for extreme interaural level differences (although free field interaural timing cues can create a wider sound field). 

      Our analysis of behavioural sensitivity was dependent on the set ‘source locations’ that were used to calculate the position of auditory and audiovisual stimuli.  In the behavioural task, participants judged the position of the target stimulus relative to a central stimulus. Thus, for each source location, we recorded how often participants correctly discriminated between presentations. The quoted analysis acknowledges that participants were more sensitive to audiovisual stimuli than auditory stimuli in the context of this task. A full explanation of how source location was implemented for auditory stimuli has been clarified in the manuscript. 

      It would be very nice to see some of the "channel" activity - to get a feel for the representation used by the decoder. 

      We have included responses for the five channels as a Supplemental Figure.

      Figure 6 appears to show that there is some agreement between behaviour and neural responses - for the audiovisual case alone. The positive correlation of behavioural and decoding sensitivity appears to be driven by one outlier - who could not perform the audiovisual task (and indeed presumably any of them). Furthermore, if we were simply Bonferonni correct for the three comparisons, this would become non-significant. It is also puzzling why the unisensory behaviour and EEG do not correlate - which seems to again suggest a poor correspondence between them. Opposite to the claim made.

      We understand the reviewer’s concern here. We would like to note, however, that each correlation used unique data sets – that is, the behavioural and neural data for each separate condition. In this case, we believe a Bonferroni correction for multiple comparisons is too conservative, as no data set was compared more than once. Neither the behavioural nor the neural data were normally distributed, and both contained outliers. Rather than reduce power through outlier rejection, we opted to test correlations using Spearman’s rho, which is resistant to outliers1. It is also worth noting that, without outlier rejection, the audiovisual correlation (p \= .003) would survive a Bonferroni correction for 3 comparisons. The nonsignificant correlation in the auditory and visual conditions might be due to the weaker responses elicited by unisensory stimuli, with the reduced signal-to-noise ratio obscuring potential correlations. Audiovisual stimuli elicited more precise responses both behaviourally and neurally, increasing the power to detect a correlation. 

      (1) Wilcox, R.R. (2016), Comparing dependent robust correlations. British Journal of Mathematical & Statistical Psychology, 69(3), 215-224. https://doi.org/10.1111/bmsp.12069

      “We also found a significant positive correlation between participants’ behavioural judgements in the EEG session and decoding sensitivity for audiovisual stimuli. This result suggests that participants who were better at identifying stimulus location also had more reliably distinct patterns of neural activity. The lack of neurobehavioural correlation in the unisensory conditions might suggest a poor correspondence between the different tasks, perhaps indicative of the differences between behavioural and neural measures explained previously. However, multisensory stimuli have consistently been found to elicit stronger neural responses than unisensory stimuli (Meredith & Stein, 1983; Puce et al., 2007; Senkowski et al., 2011; Vroomen & Stekelenburg, 2010), which has been associated with behavioural performance (Frens & Van Opstal, 1998; Wang et al., 2008). Thus, the weaker signalto-noise ratio in unisensory conditions may prevent correlations from being detected.”

      Further changes:

      (1)   To improve clarity, we shifted the Methods section to after the Discussion. This change included updating the figure numbers to match the new order (Figure 1 becomes Figure 6, Figure 2 becomes Figure 1, and so on).

      (2)   We also resolved an error on Figure 2 (previously Figure 3). The final graph (Difference between AV and A + V) displayed incorrect values on the Y axis.

      This has now been remedied.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to thank the editor and all the reviewers for their time and thoughtful consideration of our manuscript. We appreciate the valuable comments. Our provisional response to the “public review” has been published and now we have corrected factual errors and enhanced the clarity of writings based on the “recommendations for the authors.” We believe these corrections will improve the quality and accuracy of our manuscript.

      Specific responses to the reviewers' recommendations for the authors are as follows:

      Reviewer #1 (Recommendations For The Authors):

      1) Is the Slack current amplitude dependent on the Nav subtype? Differences in Slack current amplitude might explain the sensitization of Slack to quinidine.

      We appreciate the reviewer for raising this point. We examined Slack current amplitudes upon co-expression of Slack with specific NaV subtypes in HEK293 cells. The results have shown that there are no significant differences in Slack current amplitudes upon co-expression of Slack with different NaV channel subtypes (Author response image 1), suggesting whole-cell Slack current amplitudes cannot explain the varied ability of NaV subtypes to sensitize Slack to quinidine blockade.

      Author response image 1.

      The amplitudes of Slack currents upon co-expression of Slack with specific NaV subtypes in HEK293 cells. ns, p > 0.05, one-way ANOVA followed by Bonferroni’s post hoc test.

      2) Is the open probability changed by the presence of Nav1.6 and/or by the other Nav subtypes? Changes in open probability might explain the Nav1.6 induced sensitization of Slack to quinidine block.

      We appreciate the reviewer for raising this point. To investigate the effect of different NaV channel subtypes on Slack open probability, we will perform the single-channel recordings in future studies.

      3) Could the authors elaborate more on the coupling between INaT mediated sensitization of Slack to block by quinidine and the Nav1.6 N-and C-tail induced sensitization?

      We appreciate the reviewer for raising this point. We fully agree the importance of investigating the detailed mechanism underlying the sensitization of Slack to quinidine blockade. To address the questions, we plan to employ structural biological methods, such as cryo-electron microscopy (cryo-EM).

      4) Line 85: The authors use an outdated nomenclature of AMPAR subtypes. I would suggest changing to GluA1, GluA2, GluA3 and GluA4.

      We appreciate the reviewer’s suggestion. We have changed the term “GluR” to “GluA” in the revised manuscript.

      The authors do not explain the rationale by using the different homomeric AMPAR subtypes. Most often the AMPARs express as heteromeric receptors decorated by auxiliary subunits. Also, is the GluA2 the edited version?

      We thank the reviewer for raising this point. While AMPARs are often expressed as heteromeric receptors with auxiliary subunits, we focused on the homomeric AMPAR subtypes for initial screening. Through our investigation, we found no significant effects on sensitizing Slack to quinidine blockade. Additionally, the GluA2 used in our study is unedited.

      5) Line 144: I expect a reduction in current amplitude caused by blocking INaT and INaP is tested at +100mV?

      We thank the reviewer for raising this point. The reduction in current amplitude was indeed tested at +100 mV and we have included this information in the revised manuscript.

      6) Line 157 and line 162: Reference to Supplementary table S3 should be Table S2.

      We thank the reviewer for pointing this out. The reference to "Table S3" has been corrected to "Table S2" in the revised manuscript.

      7) How many times did the authors repeat the co-immunoprecipitation? Some of the bands are very weak, and repeats are necessary for all blots.

      We thank the reviewer for raising this concern. We performed the co-immunoprecipitation experiments three times independently.

      8) Line 288: The authors are showing the chimeric construct in Figures 7A and B but are referring to the full length Nav1.6 in the main text line 288.

      We apologize for the confusion. We have clarified in the revised manuscript that we used NaV1.5/6NC in our study.

      9) Figure 1 line 23: 1 uM quinidine must be 30 uM quinidine?

      We thank the reviewer for catching this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      10) Figure 2 line 53: I expect IC50 is measured at +100mV? Same question for line 60 in same figure text.

      We thank the reviewer for pointing this out. We have now included this information in the revised manuscript.

      11) Figure 4B color coding is confusing.

      We apologize for the confusion. We would like to clarify that Fig. 4B illustrates the domain architecture of the human NaV channel pore-forming α subunit, and we have changed the color from dark blue to black in the revised figure.

      12) Figure S6: Text for figure S6E and S6F has been swapped (line 96 to 106).

      We thank the reviewer for raising this point. We have rectified the swapped captions for Fig. S6E and Fig. S6F in the revised manuscript.

      13) Methods section line 652: Kainite acid should be changed to kainic acid

      We thank the reviewer for catching this typo. The term “kainite acid” has been corrected to “kainic acid” in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) Discuss limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system.

      We thank the reviewer for raising this point. We have discussed the limitations about the use of non-neuronal cells or cultured primary neurons rather than a more intact system (line 344 to line 348).

      2) Riluzole is not a selective drug, so the limitations of this drug should be discussed.

      We thank the reviewer for raising this point. We have discussed the limitations of riluzole in the revised manuscript (line 360 to line 364).

      3) Remove the term in vivo.

      We thank the reviewer for raising this point. In our experiments, although we did not conduct experiments directly in living organisms, our results demonstrated the coimmunoprecipitation of NaV1.6 with Slack in homogenates from mouse cortical and hippocampal tissues (Fig. 3C). This result may support that the interaction between Slack and NaV1.6 occurs in vivo.

      4) Figure 1

      ①C Why does Nav1.2 have a small inward current before the large inward current in the inset? The slope of the rising phase of the larger sodium current seems greater than Nav1.6 or Nav1.5. Was this examined?

      We apologize for the confusion. We would like to clarify that the small inward current can be attributed to the current of membrane capacitance (slow capacitance or C-slow). The larger inward current is mediated by NaV1.2. Additionally, we did not compare the slope of the rising phase of NaV subtypes sodium currents but primarily focused on the current amplitudes.

      ②D-E

      For Nav1.5 the sodium current is very large compared to Nav1.6. Is it possible the greater effect of quinidine for Nav1.6 is due to the lesser sodium current of Nav1.6?

      We thank the reviewer for raising this point. We would like to clarify that our results indicate that transient sodium currents contribute to the sensitization of Slack to quinidine blockade (Fig. 2C,E). Therefore, it is unlikely that the greater effect observed for NaV1.6 in sensitizing Slack is due to its lower sodium currents.

      ③The differences between WT and KO in G -H are hard to appreciate. Could quantification be shown? The text uses words like "block" but this is not clear from the figure. It seems that the replacement of Na+ with Li+ did not block the outward current or effect of quinidine.

      We apologize for the confusion. We would like to clarify the methods used in this experiment. The lithium ion (Li+) is a much weaker activator of sodium-activated potassium channel Slack than sodium ion (Na+)1,2.

      1. Zhang Z, Rosenhouse-Dantsker A, Tang QY, Noskov S, Logothetis DE. The RCK2 domain uses a coordination site present in Kir channels to confer sodium sensitivity to Slo2.2 channels. J Neurosci. Jun 2 2010;30(22):7554-62. doi:10.1523/JNEUROSCI.0525-10.2010

      2. Kaczmarek LK. Slack, Slick and Sodium-Activated Potassium Channels. ISRN Neurosci. Apr 18 2013;2013(2013)doi:10.1155/2013/354262

      Therefore, we replaced Na+ with Li+ in the bath solution to measure the current amplitudes of sodium-activated potassium currents (IKNa)3.

      1. Budelli G, Hage TA, Wei A, et al. Na+-activated K+ channels express a large delayed outward current in neurons during normal physiology. Nat Neurosci. Jun 2009;12(6):745-50. doi:10.1038/nn.2313

      The following equation was used for quantification:

      Furthermore, the remaining IKNa after application of 3 μM quinidine in the bath solution was measured as the following:

      The quantification results were presented in Fig. 1K. The term "block" used in the text referred to the inhibitory effect of quinidine on IKNa.

      ④In K, for the WT, why is the effect of quinidine only striking for the largest currents?

      We thank the reviewer for raising this point. After conducting an analysis, we found no correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (p = 0.6294) (Author response image 2). Therefore, the effect of quinidine is not solely limited to targeting the larger currents.

      Author response image 2.

      The correlation between the inhibitory effect of quinidine and the amplitudes of baseline IKNa in WT neurons (data from manuscript Fig. 1K). r = 0.1555, p=0.6294, Pearson correlation analysis.

      5) Figure 2

      ①A. The argument could be better made if the same concentration of quinidine were used for Slack and Slack + Nav1.6. It is recognized a greater sensitivity to quinidine is to be shown but as presented the figure is a bit confusing.

      We apologize for the confusion. We would like to clarify that the presented concentrations of quinidine were chosen to be near the IC50 values for Slack and Slack+NaV1.6.

      ②C. Can the authors add the effect of quinidine to the condition where the prepulse potential was - 90?

      We apologize for the confusion. We would like to clarify that the condition of prepulse potential at -90 mV is the same as the condition in Fig. 1. We only changed one experiment condition where the prepulse potential was changed to -40 mV from -90 mV.

      6) Figure 3.

      ①line 80 should be coronal not coronary

      We thank the reviewer for catching this error. We have corrected the term “coronary” to “coronal” in the caption of Figure 3.

      ②A. Clarify these 6 panels.

      We thank the reviewer for raising this point. We have clarified the captions of Fig. 3A in the revised manuscript.

      ③Please enlarge fonts in D.

      We thank the reviewer’s suggestion. We’ve enlarged the fonts in Fig. 3D in the revised manuscript.

      ④F. The variances should be checked with a test to determine if they are significantly different because they look different - if so, data can be transformed and if transformed data have variances that are equivalent a t-test can be used on the transformed data. Otherwise, Mann-Whitney should be used.

      We thank the reviewer for pointing this out. We have reanalyzed the data in Fig. 3F using Mann Whitney test after identifying the different variances in the two groups.

      7) Figure 7. The images need more clarity. They are very hard to see. Text is also hard to see.

      We apologize for the lack of clarity in the images and text. we would like to provide a concise summary of the key findings shown in this figure.

      Figure 7 illustrates an innovative intervention for treating SlackG269S-induced seizures in mice by disrupting the Slack-NaV1.6 interaction. Our results showed that blocking NaV1.6-mediated sodium influx significantly reduced Slack current amplitudes (Fig. 2D,G), suggesting that the Slack-NaV1.6 interaction contributes to the current amplitudes of epilepsy-related Slack mutant variants, aggravating the gain-of-function phenotype. Additionally, Slack’s C-terminus is involved in the Slack-NaV1.6 interaction (Fig. 5D). We assumed that overexpressing Slack’s C-terminus can disrupt the Slack-NaV1.6 interaction (compete with Slack) and thereby encounter the current amplitudes of epilepsy-related Slack mutant variants.

      In HEK293 cells, overexpression of Slack’s C-terminus indeed significantly reduced the current amplitudes of epilepsy-related SlackG288S and SlackR398Q upon co-expression with NaV1.5/6NC (Fig. 7A,B). Subsequently, we evaluated this intervention in an in vivo epilepsy model by introducing the Slack G269S variant into C57BL/6N mice using AAV injection, mimicking the human Slack mutation G288S that we previously identified (Fig. 7C-G).

      ②It is not clear how data were obtained because injection of kainic acid does not lead to a convulsive seizure every 10 min for several hours, which is what appears to be shown. Individual seizures are just at the beginning and then they merge at the start of status epilepticus. After the onset of status epilepticus the animals twitch, have varied movements, sometime rear and fall, but there is not a return to normal behavior. Therefore one can not call them individual seizures. In some strains of mice, however, individual convulsive seizures do occur (even if the EEG shows status epilepticus is occurring) but there are rarely more than 5 over several hours and the graph has many more. Please explain.

      We apologize for the confusion. Regarding the data acquisition in relation to kainic acid injection, we initiated the timing following intraperitoneal injection of kainic acid and recorded the seizure scores of per mouse at ten-minute intervals, following the methodology described in previous studies4.

      1. Huang Z, Walker MC, Shah MM. Loss of dendritic HCN1 subunits enhances cortical excitability and epileptogenesis. J Neurosci. Sep 2 2009;29(35):10979-88. doi:10.1523/JNEUROSCI.1531-09.2009

      The seizure scores were determined using a modified Racine, Pinal, and Rovner scale5,6: (1) Facial movements; (2) head nodding; (3) forelimb clonus; (4) dorsal extension (rearing); (5) Loss of balance and falling; (6) Repeated rearing and failing; (7) Violent jumping and running; (8) Stage 7 with periods of tonus; (9) Dead.

      1. Pinel JP, Rovner LI. Electrode placement and kindling-induced experimental epilepsy. Exp Neurol. Jan 15 1978;58(2):335-46. doi:10.1016/0014-4886(78)90145-0

      2. Racine RJ. Modification of seizure activity by electrical stimulation. II. Motor seizure. Electroencephalogr Clin Neurophysiol. Mar 1972;32(3):281-94. doi:10.1016/0013- 4694(72)90177-0

      8) The graphical abstract is quite complicated and somewhat hard to follow. Please simplify and clarify. One aspect of the abstract to clarify is the direction of what is first and second and third (etc.) because arrows point to many directions.

      We thank the review for raising this point. In the revised manuscript, we have included numbering of three components within the graphical abstract:

      1. Pathological phenotype: Increased Slack currents.

      2. Two types of interventions:

      2a. Disruption of the Slack-NaV1.6 interaction.

      2b. NaV1.6-mediated sensitization of Slack to quinidine blockade.

      1. Therapeutic effects: Reduced Slack currents.

      Reviewer #3 (Recommendations For The Authors):

      1) A reference to homozygous knockout is made in the abstract; however, only heterozygous mice are mentioned in the methods section. The genotype of the mice needs to be made clear in the manuscript. Furthermore, at what age were these mice used in the study. Since homozygous knockout of NaV1.6 is lethal at a very young age (<4 wks), it would be important to clarify that point as well.

      We thank the reviewer for pointing this out. In the revised manuscript, we have included information about the source of the primary cortical neurons used in our study. These neurons were obtained from postnatal homozygous NaV1.6 knockout C3HeB/FeJ mice and their wild-type littermate controls.

      2) Coimmunoprecipitation studies in Fig. 3C are not convincing. There appears to be a signal in the control lane. Furthermore, it appears that brightness levels were adjusted of that image, thereby removing completely the background.

      We thank the reviewer for pointing this out. We have replaced Fig. 3C with an unadjusted version in the revised manuscript.

      3) In Fig. 1B, the authors indicate that 30 microM of quinidine was used, while the corresponding figure legend suggest that 1 microM. Please clarify.

      We apologize for this error. We have corrected the concentration value in the caption of Figure 1 from "1 μΜ" to "30 μΜ" in the revised manuscript.

      4) How long were the cells exposed to quinidine before the functional measurement were performed?

      We thank the reviewer for pointing this out. The cells were exposed to the bath solution with quinidine for about one minute before applying step pulses.

      5) In Fig. 6B-D, it is not clear to what extent co-expression of Slack mutants and NaV1.6 increases sodium-activated potassium current.

      We thank the reviewer for pointing this out. We notice that the current amplitudes of Slack mutants exhibit a considerable degree of variation, ranging from less than 1 nA to over 20 nA (n = 5-8). To accurately measure the effects of NaV1.6 on increasing current amplitudes of Slack mutants, we plan to apply tetrodotoxin in the bath solution to block NaV1.6 sodium currents upon coexpression of Slack mutants with NaV1.6.

      6) In Fig.7A and B, it appears that some recordings had no sodium-activated potassium currents. Why were these included in analysis? How was transfection efficacy assessed?

      We apologize for the confusion. We would like to clarify that all recordings included in analysis indeed exhibited outward sodium-activated potassium currents. The current density data in Fig. 7A-B are listed in Author response table 1 (in pA/pF):

      Author response table 1.

      Regarding the assessment of transfection efficacy, we estimated it approximately by using fluorescence proteins as reporters, which were co-expressed with the relevant proteins via the selfcleaving 2A peptide.

      7) Greater detail needs to be provided for the generation of NaV1.5 and NaV1.6 chimeras. Specifically, what AA residues were changed between sodium channel isoforms?

      We thank reviewer for pointing this out. In the revised manuscript, we have included the specific amino acid residues that were changed between NaV1.5 and NaV1.6 to generate the chimeric constructs.

      8) In line 481, the authors refer to Fig. S2d instead of Fig. S6D. This should be corrected. Furthermore, the unusual shift in sodium current kinetics that the authors observe might be due in part to junction potential. Did the authors take that into consideration?

      We apologize for this error. The reference to "Fig. S2d" has been corrected to "Fig. S6D" in the revised manuscript.

      Regarding the unusual shift observed in the sodium current kinetics, we agree with the reviewer's suggestion that the junction potential may contribute to this phenomenon. During patch-clamp recordings, we ensure that the junction potential was properly compensated by the amplifier. Additionally, the replacement of CsF in pipette solution may have contributed to the observed unusual shift, as CsF in pipette solution has been reported to shift the voltage dependence of activation and fast/slow inactivation of NaV channels towards more negative potentials7.

      1. Korngreen A. Advanced patch-clamp analysis for neuroscientists. Neuromethods. Humana Press; 2016:xii, 350 pages.

      9) Legends for Fig.S6E and S6F are flipped. Please correct.

      We apologize for this error. We have rectified the flipped captions for figure S6E and S6F in the revised manuscript.

      10) Variance should be provided for the IC50 values and kinetic parameters of the sodium channels in the supplemental tables.

      We thank the reviewer for raising this point. We have included the 95% confidence interval (95%CI) for the IC50 values and kinetic parameters in the revised supplementary tables.

      Additionally, we have corrected some equations in the methods section:

      1. Line 500 and line 503: We have corrected equation (1) by adding the parameter hill coefficient.

      2. Line 514: We have revised equation (4) from to

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their thoughtful comments on the manuscript and to the editors for their assessment.

      We thank the reviewers for their positive feedback and appreciate that they consider our method a valid addition to previously established systems for generating recombinant RNA viruses.

      To strengthen this point, we have now included additional validation by the rescue of recombinant Chikungunya and Dengue virus from viral RNA directly, using the CLEVER protocol. This strengthens the potential of this method as a reverse genetics platform for positive-stranded viruses in general.

      The supportive data has been amended in the Results section, taken into account in Materials and Methods, and the corresponding supplementary figure (Figure S4) has been added.

      One key point raised by one of the reviewers, a comparison with different systems, could not be addressed in this manuscript as our lab does not at all perform BAC cloning. We currently do not have the necessary expertise to conduct an unbiased side-by-side comparison.

      All other comments were addressed in detail, either by including additional data or through specific clarification in the revised text. We are grateful for the careful review and constructive criticisms raised by the reviewers and feel that the corrections and additions have significantly improved the manuscript.

      We have revised the latest version posted May 30, 2023 on bioRxiv (https://doi.org/10.1101/2023.05.11.540343).

      Reviewer #1:

      Public Review:

      In this manuscript, Kipfer et al describe a method for a fast and accurate SARS-CoV2 rescue and mutagenesis. This work is based on a published method termed ISA (infectious subgenomic amplicons), in which partially overlapping DNA fragments covering the entire viral genome and additional 5' and 3' sequences are transfected into mammalian cell lines. These DNA fragments recombine in the cells, express the full length viral genomic RNA and launch replication and rescue of infectious virus.

      CLEVER, the method described here significantly improves on the ISA method to generate infectious SARS-CoV2, making it widely useful to the virology community.

      Specifically, the strengths of this method are:

      1) The successful use of various cell lines and transfection methods.

      2) Generation of a four-fragment system, which significantly improves the method efficiency due to lower number of required recombination events.

      3) Flexibility in choice of overlapping sequences, making this system more versatile.

      4) The authors demonstrated how this system can be used to introduce point mutations as well as insertion of a tag and deletion of a viral gene.

      5) Fast-tracking generation of infectious virus directly from RNA of clinical isolates by RT-PCR, without the need for cloning the fragments or using synthetic sequences.

      One weakness of the latter point, which is also pointed out by the authors, is that the direct rescue of clinical isolates was not tested for sequence fidelity.

      The manuscript clearly presents the findings, and the proof-of-concept experiments are well designed.

      Overall, this is a very useful method for SARS-CoV2 research. Importantly, it can be applicable to many other viruses, speeding up the response to newly emerging viruses than threaten the public health.

      We thank the reviewer for this positive feedback and the summary of the main points. Nevertheless, we would like to comment on point 5): “the direct rescue of clinical isolates was not tested for sequence fidelity”

      This impression by the reviewer suggests that the data was not sufficient on this point. However, the sequence fidelity after direct rescue from RNA was indeed tested in this study, even on a clonal level (please see: Table S2, or raw NGS data SRX20303605 - SRX20303607). For higher clarity, we added the following sentence to the manuscript:<br /> “Indeed, a slight increase of unintentional mutations was observed when sequencing clonal virus populations rescued from RNA directly”.

      Recommendations for the authors:

      Minor Points:

      1) On page 8, the authors write: "levels correlated very well with the viral phenotype". This sentence is not clear. Please clarify what you mean by "viral phenotype". Do you mean CPE on Vero cells?

      We corrected the sentence to: “(…) staining intensity and patterns correlated very well with the wild-type phenotype.”

      2) Page 9 "sequences were analyzed with a cut-off of 10%. Cutoff of what? please clarify.

      The sentence was rephrased to: “(…)mutations with a relative abundance of >10% in the entire virus population were analyzed”

      3) Page 15: The authors refer to the time required for completion of each step of the process. It would be helpful and informative for the readers to include a panel in figure 4, visualizing the timelines.

      We included a timeline in Figure 4, Panel A.

      4) Materials and methods, first paragraph: Please specify which human samples were collected. Do the authors refer to clinical virus isolates?

      We added the following information to the Materials and Methods section:<br /> “Human serum samples for neutralization assays were collected from SARS-CoV-2 vaccinated anonymous donors (…)”

      Clinical virus isolates (Material and Methods; Virus) were used for control experiments, neutralization assays, or as templates for RT-PCR.

      5) Supplementary figure 4A: The color scheme makes it hard to differentiate between the BA.1 and BA.5 fragments. Please choose colors that are not as similar to each other.

      Colors were adapted for better distinction.

      Reviewer #2:

      Public Review:

      The authors of the manuscript have developed and used cloning-free method. It is not entirely novel (rather it is based on previously described ISA method) but it is clearly efficient and useful complementation to the already existing methods. One of strong points of the approach use by authors is that it is very versatile, i.e. can be used in combination with already existing methods and tools. I find it important as many laboratories have already established their favorite methods to manipulate SARS-CoV-2 genome and are probably unwilling to change their approach entirely. Though authors highlight the benefits of their method these are probably not absolute - other methods may be as efficient or as fast. Still, I find myself thinking that for certain purposes I would like to complement my current approach with elements from authors CLEVER method.

      The work does not contain much novel biological data - which is expected for a paper dedicated to development of new method (or for improving the existing one). It may be kind of shortcoming as it is commonly expected that authors who have developed new methods apply it for discovery of something novel. The work stops on step of rescue the viruses and confirming their biological properties. This part is done very well and represents a strength of the study. The properties of rescued viruses were also studied using NSG methods that revealed high accuracy of the used method, which is very important as the method relies on use of PCR that is known to generate random mistakes and therefore not always method of choice.

      What I found missing is a real head-to-head comparison of the developed system with an existing alternatives, preferably some PCR-free standard methods such as use of BAC clones. There are a lot of comparisons but they are not direct, just data from different studies has been compared. Authors could also be more opened to discuss limitations of the method. One of these seems to be rather low rescue efficiency - 1 rescue event per 11,000 transfected cells. This is much lower compared to infectious plasmid (about 1 event per 100 cells or so) and infectious RNAs (often 1 event per 10 cells, for smaller genomes most of transfected cells become infected). This makes the CLEVER method poorly suitable for generation of large infectious virus libraries and excludes its usage for studies of mutant viruses that harbor strongly attenuating mutations. Many of such mutations may reduce virus genome infectivity by 3-4 orders of magnitude; with current efficiencies the use of CLEVER approach may result in false conclusions (mutant viruses will be classified as non-viable while in reality they are just strongly attenuated).

      We thank reviewer 2 for the careful review of our work and the valuable feedback. We agree that a direct comparison with other (PCR-free) methods such as BAC cloning, could be useful for demonstrating the unique benefits of the CLEVER method. However, as our laboratory does not use any BAC or YAC cloning methods, we could not ensure an unbiased side-byside comparison using different techniques.

      We would like to highlight the avoidance of any yeast/bacterial cloning steps that render the CLEVER protocol significantly faster and easier to handle. A visualization of the key steps that could be skipped using CLEVER in comparison to common reverse genetics methods is given in Figure 6.

      Further, we firmly believe that the benefits of the CLEVER method become especially apparent for large viral genomes such as the one of SARS-CoV-2, where assembly, genome amplification and sequence verification of plasmid DNA are highly inefficient and more timeconsuming than for small viruses like DENV, CHIKV or HIV.

      We agree with the reviewer that the overall transfection and recombination efficiencies observed with CLEVER seemed rather low. Although data on transfection/rescue efficiency is known for many techniques and viruses, we did not find any published data on the reconstitution of SARS-CoV-2 or viruses with similar genome sizes. Therefore, a useful comparator for our observations in relation to other techniques is currently simply missing. We therefore emphasize that the efficiencies of CLEVER were achieved with one of the largest plus-stranded RNA virus genomes, and our data can’t be directly compared to transfection efficiencies of short infectious RNAs.

      On the contrary, it was rather interesting to observe the very high rescue efficiency of infectious virus progeny. During the two years of establishing and validating the CLEVER protocol, we reached success rates for the genome reconstitution after transfection of >95 %. This was even obtained with highly attenuated mutants including rCoV2∆ORF3678 (joint deletion of ORF3a, ORF6, ORF7a, and ORF8) (Liu et al., 2022)(see Author response image 1). We amended this data in response to the reviewers’ comment and as an example of the successful rescue of an attenuated virus from five overlapping genome fragments (fragments A, B, C, D1, and D2∆ORF3678).

      The latter data were not added to the main manuscript since in this case the deletions were introduced using a different method: from the plasmid-based DNA fragment D2∆ORF3678 and not directly from PCR-based mutagenesis.

      Further, CLEVER was used for related substantial manipulations, including the complete deletion of the Envelope gene (E) which led to the creation of a single-cycle virus that may serve as a live, replication-incompetent vaccine candidate (Lett et al., 2023).

      Author response image 1.

      rCoV2∆ORF3678. Detection of intracellular SARS-CoV-2 nucleocapsid protein (N, green) and nuclei (Hoechst, blue) in Vero E6TMPRSS2 cells infected with rCoV2∆ORF3678 by immunocytochemistry. Scalebar is 200 µm in overview and 50 µm in ROI images.

      Recommendations for the authors:

      The work is nicely presented and the method authors has developed is clearly valuable. As indicated in Public review section the work would benefit from direct comparison of CLEVER with that of infectious plasmid (or RNA) based methods; direct comparison of data would be more convincing that indirect one. Authors should also discuss possible limitations of the method - this is helpful for a reader.

      We were not able to perform a direct comparison of CLEVER with other methods (see our statement above).

      We added the following section to the discussion: “Along with the advantages of the CLEVER protocol, limitations must be considered: Interestingly, virus was never rescued after transfecting Vero E6 cells, as has been observed previously (Mélade et al., 2022). Whether this is due to low transfection efficiency or the cell’s inability to recombine remains to be elucidated. Other cell lines not tested within this study will have to be tested for efficient recombination and virus production first. Further, the high sequence integrity of rescued virus is highly dependent on the fidelity of the DNA polymerase used for amplification. The use of other enzymes might negatively influence the sequence integrity of recombinant virus, as it has been observed for the direct rescue from viral RNA using a commercially available onestep RT-PCR kit. Another limitation when performing direct mutagenesis is the synthesis of long oligos to create an overlapping region. Repetitive sequences, for example, can impair synthesis, and self-annealing and hairpin formation increase with prolonged oligos.”

      Some technical corrections of the text would be beneficial. In all past of the text the use of terms applicable only for DNA or RNA is mixed and creates some confusion. For example, authors state that "the human cytomegalovirus promoter (CMV) was cloned upstream of 5' UTR and poly(A) tail, the hepatitis delta ribozyme (HDVr) and the simian virus 40 polyadenylation signal downstream of the 3' UTR". Strictly speaking it is impossible as such a construct would contain dsDNA sequence (CMV promoter) followed by ssRNA (5'UTR, polyA tail and HDV ribozyme) and then again dsDNA (SV40 terminator). So, better to be correct and add "sequences corresponding to", "dsDNA copies of" to the description of RNA elements

      We thank the reviewer for the advice but would like to state that in scientific language it is common to assume that nucleic acid cloning is based on DNA.

      We have corrected the description in the Methods section: “The human cytomegalovirus promoter (CMV) was cloned upstream of the DNA sequence of the viral 5’UTR; herein, the first five nucleotides (ATATT) correspond to the 5’UTR of SARS-CoV. Sequences corresponding to the poly(A) tail (n=35), the hepatitis delta virus ribozyme (HDVr), and the simian virus 40 polyadenylation signal (SV40pA) were cloned immediately downstream of the DNA sequence of the viral 3’UTR.”

      For ease of reading and for consistent terminology, we kept the original spelling in the rest of the manuscript.

      In description of neutralization assay authors have used temperature 34 C for incubation of virus with antibodies as well as for subsequent incubation of infected cells. Why this temperature was used?

      The following sentence was added (Materials and Methods; Cells): “A lower incubation temperature was chosen based on previous studies (V’kovski et al., 2021).”

      References

      Lett MJ, Otte F, Hauser D, Schön J, Kipfer ET, Hoffmann D, Halwe NJ, Ulrich L, Zhang Y, Cmiljanovic V, Wylezich C, Urda L, Lang C, Beer M, Mittelholzer C, Klimkait T. 2023. Single-cycle SARS-CoV-2 vaccine elicits high protection and sterilizing immunity in hamsters. doi:10.1101/2023.05.17.541127

      Liu Y, Zhang X, Liu J, Xia H, Zou J, Muruato AE, Periasamy S, Kurhade C, Plante JA, Bopp NE, Kalveram B, Bukreyev A, Ren P, Wang T, Menachery VD, Plante KS, Xie X, Weaver SC, Shi P-Y. 2022. A live-attenuated SARS-CoV-2 vaccine candidate with accessory protein deletions. Nat Commun 13:4337. doi:10.1038/s41467-022-31930-z

      V’kovski P, Gultom M, Kelly JN, Steiner S, Russeil J, Mangeat B, Cora E, Pezoldt J, Holwerda M, Kratzel A, Laloli L, Wider M, Portmann J, Tran T, Ebert N, Stalder H, Hartmann R, Gardeux V, Alpern D, Deplancke B, Thiel V, Dijkman R. 2021. Disparate temperaturedependent virus–host dynamics for SARS-CoV-2 and SARS-CoV in the human respiratory epithelium. PLoS Biol 19:e3001158. doi:10.1371/journal.pbio.3001158

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study addresses how faces and bodies are integrated in two STS face areas revealed by fMRI in the primate brain. It builds upon recordings and analysis of the responses of large populations of neurons to three sets of images, that vary face and body positions. These sets allowed the authors to thoroughly investigate invariance to position on the screen (MC HC), to pose (P1 P2), to rotation (0 45 90 135 180 225 270 315), to inversion, to possible and impossible postures (all vs straight), to the presentation of head and body together or in isolation. By analyzing neuronal responses, they found that different neurons showed preferences for body orientation, head orientation, or the interaction between the two. By using a linear support vector machine classifier, they show that the neuronal population can decode head-body angle presented across orientations, in the anterior aSTS patch (but not middle mSTS patch), except for mirror orientation.

      Strengths:

      These results extend prior work on the role of Anterior STS fundus face area in face-body integration and its invariance to mirror symmetry, with a rigorous set of stimuli revealing the workings of these neuronal populations in processing individuals as a whole, in an important series of carefully designed conditions.

      Minor issues and questions that could be addressed by the authors:

      (1) Methods. While monkeys certainly infer/recognize that individual pictures refer to the same pose with varying orientations based on prior studies (Wang et al.), I am wondering whether in this study monkeys saw a full rotation of each of the monkey poses as a video before seeing the individual pictures of the different orientations, during recordings.

      The monkeys had not been exposed to videos of a rotating monkey pose before the recordings. However, they were reared and housed with other monkeys, providing them with ample experience of monkey poses from different viewpoints.

      (2) Experiment 1. The authors mention that neurons are preselected as face-selective, body-selective, or both-selective. Do the Monkey Sum Index and ANOVA main effects change per Neuron type?

      We have performed a new analysis to assess whether the Monkey Sum Index is related to the response strength for the face versus the body as measured in the Selectivity Test of Experiment 1. To do this we selected face- and body-category selective neurons, as well as neurons responding selectively to both faces and bodies. First, we selected those neurons that responded significantly to either faces, bodies, or the two control object categories, using a split-plot ANOVA for these 40 stimuli. From those neurons, we selected face-selective ones having at least a twofold larger mean net response to faces compared to bodies (faces > 2 * bodies) and the control objects for faces (faces  > 2* objects). Similarly, a body-selective neuron was defined by a twofold larger mean net response to bodies compared to faces and the control objects for bodies. A body-and-face selective neuron was defined as having a twofold larger net response to the faces compared to their control objects, and to bodies compared to their control objects, with the ratio between mean response to bodies and faces being less than twofold. Then, we compared the distribution of the Monkey Sum Index (MSI) for each region (aSTS; mSTS), pose (P1, P2), and centering (head- (HC) or monkey-centered (MC)) condition. Too few body-and-face selective neurons were present in each combination of region, pose, and centering (a maximum of 7) to allow a comparison of their MSI distribution with the other neuron types. The Figure below shows the distribution of the MSI for the different orientation-neuron combinations for the body- and face-selective neurons (same format as in Figure 3a, main text). The number of body-selective neurons, according to the employed criteria, varied from 21 to 29, whereas the number of face-selective neurons ranged from 14 to 24 (pooled across monkeys). The data of the two subjects are shown in a different color and the number of cases for each subject is indicated (n1: number of cases for M1; n2: number of cases for M2). The arrows indicate the medians for the data pooled across the monkey subjects. For the MC condition, the MSI tended to be more negative (i.e. relatively less response to the monkey compared to the sum of the body and face responses) for the face compared to the body cells, but this was significant only for mSTS and P1 (p = 0.043; Wilcoxon rank sum test; tested after averaging the indices per neuron to avoid dependence of indices within a neuron). No consistent, nor significant tendencies were observed for the HC stimuli. This absence of a consistent relationship between MSI and face- versus body-selectivity is in line with the absence of a correlation between the MSI and face- versus body-selectivity using natural images of monkeys in a previous study (Zafirova Y, Bognár A, Vogels R. Configuration-sensitive face-body interactions in primate visual cortex. Prog Neurobiol. 2024 Jan;232:102545).

      We did not perform a similar analysis for the main effects of the two-way ANOVA because the very large majority of neurons showed a significant effect of body orientation and thus no meaningful difference between the two neuron types can be expected.

      Author response image 1.

      (3) I might have missed this information, but the correlation between P1 and P2 seems to not be tested although they carry similar behavioral relevance in terms of where attention is allocated and where the body is facing for each given head-body orientation.

      Indeed, we did not compute this correlation between the responses to the sitting (P1) and standing (P2) pose avatar images. However, as pointed out by the reviewer, one might expect such correlations because of the same head orientations and body-facing directions. Thus, we computed the correlation between the 64 head-body orientation conditions of P1 and P2 for those neurons that were tested with both poses and showed a response for both poses (Split-plot ANOVA). This was performed for the Head-Centered and Monkey-Centered tests of Experiment 1 for each monkey and region. Note that not all neurons were tested with both poses (because of failure to maintain isolation of the single unit in both tests or the monkey stopped working) and not all neurons that were recorded in both tests showed a significant response for both poses, which is not unexpected since these neurons can be pose selective. The distribution of the Pearson correlation coefficients of the neurons with a significant response in both tests is shown in Figure S1. The median correlation coefficient was significantly larger than zero for each region, monkey, and centering condition (outcome of Wilcoxon tests, testing whether the median was different from zero (p1 = p-value for M1; p2: p-value for M2) in Figure), indicating that the effect of head and/or body orientation generalizes across pose. We have noted this now in the Results (page 12) and added the Figure (New Figure S1) in the Suppl. Material.

      (4) Is the invariance for position HC-MC larger in aSTS neurons compared to mSTS neurons, as could be expected from their larger receptive fields?

      Yes, the position tolerance of the interaction of body and head orientation was significantly larger for aSTS compared to mSTS neurons, as we described on pages 11 and 12 of the Results. This is in line with larger receptive fields in aSTS than in mSTS. However, we did not plot receptive fields in the present study.

      (5) L492 "The body-inversion effect likely results from greater exposure to upright than inverted bodies during development". Monkeys display more hanging upside-down behavior than humans, however, does the head appear more tilted in these natural configurations?

      Indeed, infant monkeys do spend some time hanging upside down from their mother's belly. While we lack quantitative data on this behavior, casual observations suggest that even young monkeys spend more time upright. The tilt of the head while hanging upside down can vary, just as it does in standing or sitting monkeys (as when they search for food or orient to other individuals). To our knowledge, no quantitative data exist on the frequency of head tilts in upright versus upside-down monkeys. Therefore, we refrain from further speculation on this interesting point, which warrants more attention.

      (6) Methods in Experiment 1. SVM. How many neurons are sufficient to decode the orientation?

      The number of neurons that are needed to decode the head-body orientation angle depends on which neurons are included, as we show in a novel analysis of the data of Experiment 1. We employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding. We show also the ten “best” neurons for each centering condition and pose. These have a variety of tuning patterns for head and body orientation suggesting that the decoding of head-body orientation angle depends on a population code. Notably, the best-ranked (rank N) neuron alone achieved above-chance accuracy. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (new Figure S3).

      (7) Figure 3D 3E. Could the authors please indicate for each of these neurons whether they show a main effect of face, body, or interaction, as well as their median corrected correlation to get a flavor of these numbers for these examples?

      We have indicated these now in Figure 3.

      (8) Methods and Figure 1A. It could be informative to precise whether the recordings are carried in the lateral part of the STS or in the fundus of the STS both for aSTS and mSTS for comparison to other studies that are using these distinctions (AF, AL, MF, ML).

      In experiment 1, the recording locations were not as medial as the fundus. For experiments 2 and 3, the ventral part of the fundus was included, as described in the Methods. We have added this to the Methods now (page 31).

      Wang, G., Obama, S., Yamashita, W. et al. Prior experience of rotation is not required for recognizing objects seen from different angles. Nat Neurosci 8, 1768-1775 (2005). https://doi-org.insb.bib.cnrs.fr/10.1038/nn1600

      Reviewer #2 (Public review):

      Summary:

      This paper investigates the neuronal encoding of the relationship between head and body orientations in the brain. Specifically, the authors focus on the angular relationship between the head and body by employing virtual avatars. Neuronal responses were recorded electrophysiologically from two fMRI-defined areas in the superior temporal sulcus and analyzed using decoding methods. They found that: (1) anterior STS neurons encode head-body angle configurations; (2) these neurons distinguish aligned and opposite head-body configurations effectively, whereas mirror-symmetric configurations are more difficult to differentiate; and (3) an upside-down inversion diminishes the encoding of head-body angles. These findings advance our understanding of how visual perception of individuals is mediated, providing a fundamental clue as to how the primate brain processes the relationship between head and body - a process that is crucial for social communication.

      Strengths:

      The paper is clearly written, and the experimental design is thoughtfully constructed and detailed. The use of electrophysiological recordings from fMRI-defined areas elucidated the mechanism of head-body angle encoding at the level of local neuronal populations. Multiple experiments, control conditions, and detailed analyses thoroughly examined various factors that could affect the decoding results. The decoding methods effectively and consistently revealed the encoding of head-body angles in the anterior STS neurons. Consequently, this study offers valuable insights into the neuronal mechanisms underlying our capacity to integrate head and body cues for social cognition-a topic that is likely to captivate readers in this field.

      Weaknesses:

      I did not identify any major weaknesses in this paper; I only have a few minor comments and suggestions to enhance clarity and further strengthen the manuscript, as detailed in the Private Recommendations section.

      Reviewer #3 (Public review):

      Summary:

      Zafirova et al. investigated the interaction of head and body orientation in the macaque superior temporal sulcus (STS). Combining fMRI and electrophysiology, they recorded responses of visual neurons to a monkey avatar with varying head and body orientations. They found that STS neurons integrate head and body information in a nonlinear way, showing selectivity for specific combinations of head-body orientations. Head-body configuration angles can be reliably decoded, particularly for neurons in the anterior STS. Furthermore, body inversion resulted in reduced decoding of head-body configuration angles. Compared to previous work that examined face or body alone, this study demonstrates how head and body information are integrated to compute a socially meaningful signal.

      Strengths:

      This work presents an elegant design of visual stimuli, with a monkey avatar of varying head and body orientations, making the analysis and interpretation straightforward. Together with several control experiments, the authors systematically investigated different aspects of head-body integration in the macaque STS. The results and analyses of the paper are mostly convincing.

      Weaknesses:

      (1) Using ANOVA, the authors demonstrate the existence of nonlinear interactions between head and body orientations. While this is a conventional way of identifying nonlinear interactions, it does not specify the exact type of the interaction. Although the computation of the head-body configuration angle requires some nonlinearity, it's unclear whether these interactions actually contribute. Figure 3 shows some example neurons, but a more detailed analysis is needed to reveal the diversity of the interactions. One suggestion would be to examine the relationship between the presence of an interaction and the neural encoding of the configuration angle.

      This is an excellent suggestion. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles. Based on a suggestion from reviewer 2, we performed for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant was low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%). However, a higher percentage of the 10 best neurons for each pose (indicated by the star) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were diverse, suggesting a populationl code for head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).

      (2) Figure 4 of the paper shows a better decoding of the configuration angle in the anterior STS than in the middle STS. This is an interesting result, suggesting a transformation in the neural representation between these two areas. However, some control analyses are needed to further elucidate the nature of this transformation. For example, what about the decoding of head and body orientations - dose absolute orientation information decrease along the hierarchy, accompanying the increase in configuration information?

      We have performed now two additional analyses, one in which we decoded the orientation of the head and another one in which we decoded the orientation of the body. We employed the responses to the avatar of experiment 1, using the same sample of neurons of which we decoded the head-body orientation angle. To decode the head orientation, the trials with identical head orientation, irrespective of their body orientation, were given the same label. For this, we employed only responses in the head-centered condition. To decode the body orientation, the trials with identical body orientation, irrespective of their head orientation, had the same label, and we employed only responses in the body-centered condition. The decoding was performed separately for each pose (P1 and P2) and region. We decoded either the responses of 20 neurons (10 randomly sampled from each monkey for each of the 1000 resamplings), 40 neurons (20 randomly sampled per monkey), or 60 neurons (30 neurons per monkey) since the sample of 60 neurons yielded close to ceiling performance for the body orientation decoding. For each pose, the body orientation decoding was worse for aSTS than for mSTS, although this difference reached significance only for P1 and for the 40 neurons sample of P2 (p < 0.025; two-tailed test; same procedure as employed for testing the significance of the decoding of whole-body orientation for upright versus inverted avatars (Experiment 3))). Face orientation decoding was significantly worse for aSTS compared to mSTS. These results are in line with the previously reported decreased decoding of face orientation in the anterior compared to mid-STS face patches (Meyers EM, Borzello M, Freiwald WA, Tsao D. Intelligent information loss: the coding of facial identity, head pose, and non-face information in the macaque face patch system. J Neurosci. 2015 May 6;35(18):7069-81), and decreased decoding of body orientation in anterior compared to mid-STS body patches (Kumar S, Popivanov ID, Vogels R. Transformation of Visual Representations Across Ventral Stream Body-selective Patches. Cereb Cortex. 2019 Jan 1;29(1):215-229). As mentioned by the reviewer, this contrasts with the decoding of the head-body orientation angle, which increases when moving more anteriorly. We mention this finding now in the Discussion (page 27) and present the new Figure S10 in the Suppl. Material.    

      (3) While this work has characterized the neural integration of head and body information in detail, it's unclear how the neural representation relates to the animal's perception. Behavioural experiments using the same set of stimuli could help address this question, but I agree that these additional experiments may be beyond the scope of the current paper. I think the authors should at least discuss the potential outcomes of such experiments, which can be tested in future studies.

      Unfortunately, we do not have behavioral data. One prediction would be that the discrimination of head-body orientation angle, irrespective of the viewpoint of the avatar, would be more accurate for zero versus straight angles compared to the right versus left angles. We have added this to the Discussion (page 28).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) P22 L373. It should read Figure S5C instead of S4C.

      Thanks; corrected.

      (2) Figure 7B. All inverted decoding accuracies, although significantly lower than upright decoding accuracies, appear significantly above baseline. Should the title be amended accordingly?

      Thanks for pointing this out. To avoid future misunderstanding we have changed the title to:

      “Integration of head and body orientations in the macaque superior temporal sulcus is stronger for upright bodies”

      (3) Discussion L432-33. "with some neurons being tuned to a particular orientation of both the head and the body". Wouldn't that be visible as a diagonal profile on the normalized net responses in Fig 3D? Or can the Anova evidence such a tuning?

      We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron shown in Figure 3D. We have corrected the sentence.

      Reviewer #2 (Recommendations for the authors):

      Major comment:

      This paper effectively demonstrates that the angular relationship between the head and body can be decoded from population responses in the anterior STS. In other words, these neurons encode information about the head-body angle. However, how exactly do these neurons encode this information? Given that the study employed electrophysiological recordings from a local population of neurons, it might be possible to provide additional data on the response patterns of individual neurons to shed light on the underlying encoding mechanisms.

      Although the paper already presents example response patterns (Figures 3D, E) and shows that STS neurons encode interactions between head and body orientations (Figure 3B), it remains unclear whether the angle difference between the head and body has a systematic effect on neuronal responses. For instance, a description of whether some neurons preferentially encode specific head-body angle differences (e.g., a "45-degree angle neuron"), or additional population analyses such as a one-way ANOVA with angle difference as the main effect (or two-way ANOVA with angle difference as one of the main effect), would be very informative. Such data could offer valuable insights into how individual neurons contribute to the encoding of head-body angle differences-a detail that may also be reflected in the decoding results. Alternatively, it is possible that the encoding of head-body angle is inherently complex and only discernible via decoding methods applied to population activity. Either scenario would provide interesting and useful information to the field.

      We have performed two additional analyses which are relevant to this comment. First, we attempted to relate the tuning for body and head orientation with the decoding of the head-body orientation angle. To do this, one needs to identify the neurons that contribute to the decoding of head-body orientation angles. For that, we employed a neuron-dropping analysis, similar to Chiang et al. (Chiang FK, Wallis JD, Rich EL. Cognitive strategies shift information from single neurons to populations in prefrontal cortex. Neuron. 2022 Feb 16;110(4):709-721.) to assess the positive (or negative) contribution of each neuron to the decoding performance. We performed cross-validated linear SVM decoding N times, each time leaving out a different neuron (using N-1 neurons; 2000 resamplings of pseudo-population vectors). We then ranked decoding accuracies from highest to lowest, identifying the ‘worst’ (rank 1) to ‘best’ (rank N) neurons. Next, we conducted N decodings, incrementally increasing the number of included neurons from 1 to N, starting with the worst-ranked neuron (rank 1) and sequentially adding the next (rank 2, rank 3, etc.). This analysis focused on zero versus straight angle decoding in the aSTS, as it yielded the highest accuracy. We applied it when training on MC and testing on HC for each pose. Plotting accuracy as a function of the number of included neurons suggested that less than half contributed positively to decoding (see Figure S3). We examined the tuning for head and body orientation of the 10 “best” neurons (Figure S3). For half or more of those the two-way ANOVA showed a significant interaction. These are indicated by the red color in the Figure. They showed a variety of tuning patterns for head and body orientation, suggesting that the decoding of the head-body orientation angle results from a combination of neurons with different tuning profiles.

      Second, we have followed the suggestion of the reviewer to perform for each neuron of experiment 1 a one-way ANOVA with as factor head-body orientation angle. To do that, we combined all 64 trials that had the same head-body orientation angle. The percentage of neurons (required to be responsive in the tested condition) for which this one-way ANOVA was significant is shown in the Tables below for each region, separately for each pose (P1, P2), centering condition (MC = monkey-centered; HC = head-centered) and monkey subject (M1, M2). The percentages were low but larger than the expected 5% (Type 1 error), with a median of 16.5% (range: 3 to 23%) in aSTS and 8% for mSTS (range: 0-19%).

      Author response table 1.

      Interestingly, a higher percentage of the 10 best neurons for each pose (indicated by the star in the Figure above) showed a significant one-way ANOVA for angle (for P1, MC: 50% (95% confidence interval (CI): 19% – 81%); P1, HC: 70% (CI: 35% - 93%); P2, MC: 70% (CI: 35% – 93%); P2: HC: 50% (CI: 19%-81%)). These percentages were significantly higher than expected for a random sample from the population of neurons for each pose-centering combination (expected percentages listed in the same order as above: 16%, 13%, 16%, and 10%; all outside CI). Thus, for at least half of the “best” neurons, the response differed significantly among the head-orientation angles at the single neuron level. Nonetheless, the tuning profiles were quite diverse, suggesting population coding of head-body orientation angle. We have added this interesting and novel result to the Results (page 16) and Suppl. Material (Figure S3).    

      Minor comments:

      (1) Figure 4A, Fourth Row Example (Zero Angle vs. Straight Angle, Bottom of the P2 Examples): The order of the example stimuli might be incorrect- the 0{degree sign} head with 180{degree sign} body stimulus (leftmost) might be swapped with the 180{degree sign} head with 0{degree sign} body stimulus (5th from the left). While this ordering may be acceptable, please double-check whether it reflects the authors' intended arrangement.

      We have changed the order of the two stimuli in Figure 4A, following the suggestion of the reviewer.

      (2) Page 12, Lines 192-194: The text states, "Interestingly, some neurons (e.g. Figure 3D) were tuned to a particular combination of a head and body irrespective of centering." However, Figure 3D displays data for a total of 10 neurons. Could you please specify which of these neurons are being referred to in this context?

      The wording was not optimal. We meant to say that some neurons were tuned to a particular combination of head and body orientation, like the third aSTS example neuron of Figure 3D. We have rephrased the sentence and clarified which example neuron we referred to.

      (3) Page 28, Lines 470-471: The text states, "We observed no difference in response strength between anatomically possible and impossible configurations." Please clarify which data were compared for response strength, as I could not locate the corresponding analyses.

      The anatomically possible and impossible configurations differ in the head-body orientation angle. However, as we reported before in the Results, there was no effect of head-body orientation angle on mean response strength across poses (Friedman ANOVA; all p-values for both poses and centerings > 0.1). We have clarified this now in the Discussion (page 28).

      (4) Pages 40-43, Decoding Analyses: In experiments 2 and 3, were the decoding analyses performed on simultaneously recorded neurons? If so, such analyses might leverage trial-by-trial correlations and thus avoid confounds from trial-to-trial variability. In contrast, experiment 1, which used single-shank electrodes, would lack this temporal information. Please clarify how trial numbers were assigned to neurons in each experiment and how this assignment may have influenced the decoding performance.

      For the decoding analyses of experiments 2 and 3, we combined data from different daily penetrations, with only units from the same penetration being recorded simultaneously. In the decoding analyses of each experiment, the trials were assigned randomly to the pseudo-population vectors, shuffling on each resampling the trial order per neuron. This shuffling abolishes noise correlations in the analysis of each experiment.

      (5) Page 41, Lines 792-802: The authors state that "To assess the significance of the differences in classification scores between pairs of angles ... we computed the difference in classification score between the two pairs for each resampling and the percentile of 0 difference corresponded to the p-value." In a two-sided test under the null hypothesis of no difference between the distributions, the conventional approach would be to compute the p-value as the proportion of resampled differences that are as extreme or more extreme than the observed difference. Since a zero difference might be relatively rare, relying solely on its percentile could potentially misrepresent the tail probabilities relevant to a two-sided test. Could you clarify how their method addresses this issue?

      This test is based on the computation of the distribution of the difference between classification accuracies across resamplings. This is similar to the computation of the confidence interval of a  difference. Thus, we assess whether the theoretical zero value (= no difference; = null hypothesis) is outside the 2.5 and 97.5 percentile interval of the computed distribution of the empirically observed differences. We clarified now in the Methods (page 41) that for a two-tailed test the computed p-value (the percentile of the zero value) should be smaller than 0.025.

      (6) Page 43, Lines 829-834: The manuscript explains: "The mean of 10 classification accuracies (i.e., of 10 resamplings) was employed to obtain a distribution (n=100) of the differences in classification accuracy ... The reported standard deviations of the classification accuracies are computed using also the means of 10 resamplings." I am unfamiliar with this type of analysis and am unclear about the rationale for calculating distributions and standard deviations based on the means of 10 resamplings rather than using the original distribution of classification accuracies. This resampling procedure appears to yield a narrower distribution and smaller standard deviations than the original data. Could you please justify this approach?

      The logic of the analysis is to reduce the noise in the data, by averaging across 10 randomly selected resamplings, but still keeping a sufficient number of data (100 values) for a test.

      Reviewer #3 (Recommendations for the authors):

      (1) Some sentences are too long and difficult to parse. For example, in line 177: "the correlations between the responses to the 64 head-body orientation conditions of the two centerings for the neuron and pose combinations showing significant head-body interactions for the two centerings were similar to those observed for the whole population."

      We have modified this sentence: For neuron and pose combinations with significant head-body interactions in both centerings, the correlations between responses to the 64 head-body orientation conditions were similar to those observed in the whole population.

      (2) The authors argue in line 485: "in our study, a search bias cannot explain the body-inversion effect since we selected responsive units using both upright and inverted images." However, the body-selective patches were localized using upright images, correct?

      The monkey-selective patches were localized using upright images indeed. However, we recorded in experiment 3 (and 2) also outside the localized patches (as we noted before in the Methods:  “In experiments 2 and 3 we recorded from a wider region, which overlapped with the two monkey patches and the recording locations of experiment 1”). Furthermore, the preference for upright monkey images is not an all-or-nothing phenomenon: most units still responded to inverted monkeys. Also, we believe it is likely that the mean responses to the inverted bodies in the monkey patches, defined by upright bodies versus objects, would be larger than those to objects and we would be surprised to learn that there is a patch selective for inverted bodies that we would have missed with our localizer.

      (3) Typo: line 447, "this independent"->"is independent"?

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Campbell et al investigated the effects of light on the human brain, in particular the subcortical part of the hypothalamus during auditory cognitive tasks. The mechanisms and neuronal circuits underlying light effects in non-image forming responses are so far mostly studied in rodents but are not easily translated in humans. Therefore, this is a fundamental study aiming to establish the impact light illuminance has on the subcortical structures using the high-resolution 7T fMRI. The authors found that parts of the hypothalamus are differently responding to illuminance. In particular, they found that the activity of the posterior hypothalamus increases while the activity of the anterior and ventral parts of the hypothalamus decreases under high illuminance. The authors also report that the performance of the 2-back executive task was significantly better in higher illuminance conditions. However, it seems that the activity of the posterior hypothalamus subpart is negatively related to the performance of the executive task, implying that it is unlikely that this part of the hypothalamus is directly involved in the positive impact of light on performance observed. Interestingly, the activity of the posterior hypothalamus was, however, associated with an increased behavioural response to emotional stimuli. This suggests that the role of this posterior part of the hypothalamus is not as simple regarding light effects on cognitive and emotional responses. This study is a fundamental step towards our better understanding of the mechanisms underlying light effects on cognition and consequently optimising lighting standards. 

      Strengths: 

      While it is still impossible to distinguish individual hypothalamic nuclei, even with the highresolution fMRI, the authors split the hypothalamus into five areas encompassing five groups of hypothalamic nuclei. This allowed them to reveal that different parts of the hypothalamus respond differently to an increase in illuminance. They found that higher illuminance increased the activity of the posterior part of the hypothalamus encompassing the MB and parts of the LH and TMN, while decreasing the activity of the anterior parts encompassing the SCN and another part of TMN. These findings are somewhat in line with studies in animals. It was shown that parts of the hypothalamus such as SCN, LH, and PVN receive direct retinal input in particular from ipRGCs. Also, acute chemogenetic activation of ipRGCs was shown to induce activation of LH and also increased arousal in mice. 

      Weaknesses: 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin and/or other photoreceptors. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. This may be something to consider when designing the follow-up studies. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) As suggested in the public review more information regarding the reasons behind the chosen light condition is needed. 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin or cone opsins. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. 

      (2) In support of this work, it was shown in mice that acute activation of ipRGCs using chemogenetics induces c-fos in some of the hypothalamic brain areas discussed here including LH (Milosavljevic et al, 2016 Curr Biol). Another study to consider including in the discussion is by Sonoda et al 2020 Science, in which the authors showed that a subset of ipRGCs release GABA. 

      (3) Figure 1 looks squashed, especially the axes. Also, Figure 2 looks somewhat blurry. I would suggest that the authors edit the figures to correct this.

      We thank the reviewer for their positive comments and agree with the weaknesses they pointed out. 

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGES 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      The discussion also points out the promises of silent substitution (PAGE 13): “Future human studies could isolate the contribution of each photoreceptor class to the impact of light on cognitive brain functions by manipulating prior light history (Chellappa et al., 2014) or through the use of silent substitutions between metameric light exposures (Viénot et al., 2012)”.

      (2) We now refer to the studies by Milosavljevic et al. and Sonoda et al. 

      PAGE 9: “Our data may therefore be compatible with an increase in orexin release by the LH with increasing illuminance. In line with this assumption, chemoactivation of ipRGCs lead to increase c-fos production, a marker of cellular activation, over several nuclei of the hypothalamus, including the lateral hypothalamus (Milosavljevic et al., 2016). If this initial effect of light we observe over the posterior part of the hypothalamus was maintained over a longer period of exposure, this would stimulate cognition and maintain or increase alertness (Campbell et al., 2023) and may also be part of the mechanisms through which daytime light increases the amplitude in circadian variations of several physiological features (BanoOtalora et al., 2021; Dijk et al., 2012).”

      PAGE 10: “Chemoactivation of ipRGCs in rodents led to an increase activity of the SCN, over the inferior anterior hypothalamus, but had no impact on the activity of the VLPO, over the superior anterior hypothalamus (Milosavljevic et al., 2016). How our findings fit with these fine-grained observations and whether there are species-specific differences in the responses to light over the different part of the hypothalamus remains to be established.”

      PAGE 10: “In terms of chemical communication, these changes in activity could be the results of an inhibitory signal from a subclass of ipRGCs, potentially through the release aminobutyric acid (GABA), as a rodent study found that a subset of ipRGCs release GABA at brain targets including the SCN (and intergeniculate leaflet and ventral lateral geniculate nucleus), leading to a reduction in the ability of light to affect pupil size and circadian photoentrainment (Sonoda et al., 2020). Whatever the signalling of ipRGC, our finding over the anterior hypothalamus could correspond to a modification of GABA signalling of the SCN which has been reported to have excitatory properties, such that the BOLD signal changes we report may correspond to a reduction in excitation arising in part from the SCN (Albers et al., 2017).”

      (3) Figures 1 and 2 were modified. We hope their quality is now satisfactory. We are willing to provide separate figures prior to publication of the Version of Record.

      Reviewer #2 (Public Review): 

      Summary 

      The interplay between environmental factors and cognitive performance has been a focal point of neuroscientific research, with illuminance emerging as a significant variable of interest. The hypothalamus, a brain region integral to regulating circadian rhythms, sleep, and alertness, has been posited to mediate the effects of light exposure on cognitive functions. Previous studies have illuminated the role of the hypothalamus in orchestrating bodily responses to light, implicating specific neural pathways such as the orexin and histamine systems, which are crucial for maintaining wakefulness and processing environmental cues. Despite advancements in our understanding, the specific mechanisms through which varying levels of light exposure influence hypothalamic activity and, in turn, cognitive performance, remain inadequately explored. This gap in knowledge underscores the need for high-resolution investigations that can dissect the nuanced impacts of illuminance on different hypothalamic regions. Utilizing state-of-the-art 7 Tesla functional magnetic resonance imaging (fMRI), the present study aims to elucidate the differential effects of light on the hypothalamic dynamics and establish a link between regional hypothalamic activity and cognitive outcomes in healthy young adults. By shedding light on these complex interactions, this research endeavours to contribute to the foundational knowledge necessary for developing innovative therapeutic strategies aimed at enhancing cognitive function through environmental modulation. 

      Strengths: 

      (1) Considerable Sample Size and Detailed Analysis: The study leverages a robust sample size and conducts a thorough analysis of hypothalamic dynamics, which enhances the reliability and depth of the findings. 

      (2) Use of High-Resolution Imaging: Utilizing 7 Tesla fMRI to analyze brain activity during cognitive tasks offers high-resolution insights into the differential effects of illuminance on hypothalamic activity, showcasing the methodological rigor of the study. 

      (3) Novel Insights into Illuminance Effects: The manuscript reveals new understandings of how different regions of the hypothalamus respond to varying illuminance levels, contributing valuable knowledge to the field. 

      (4) Exploration of Potential Therapeutic Applications: Discussing the potential therapeutic applications of light modulation based on the findings suggests practical implications and future research directions. 

      Weaknesses: 

      (1) Foundation for Claims about Orexin and Histamine Systems: The manuscript needs to provide a clearer theoretical or empirical foundation for claims regarding the impact of light on the orexin and histamine systems in the abstract. 

      (2) Inclusion of Cortical Correlates: While focused on the hypothalamus, the manuscript may benefit from discussing the role of cortical activation in cognitive performance, suggesting an opportunity to expand the scope of the manuscript. 

      (3) Details of Light Exposure Control: More detailed information about how light exposure was controlled and standardized is needed to ensure the replicability and validity of the experimental conditions. 

      (4) Rationale Behind Different Exposure Protocols: To clarify methodological choices, the manuscript should include more in-depth reasoning behind using different protocols of light exposure for executive and emotional tasks. 

      Reviewer #2 (Recommendations For The Authors): 

      Attention to English language precision and correction of typographical errors, such as "hypothalamic nuclei" instead of "hypothalamus nuclei," is necessary for enhancing the manuscript.

      We thank the reviewer for recognising the interest and strength of our study.

      (1) As detailed in the discussion, we do believe orexin and histamine are excellent candidates for mediating the results we report. As also pointing out, however, we are in no position to know which neurons, nuclei, neurotransmitter and neuromodulator underlie the results. The last sentence of the abstract (PAGE 2) was therefore removed as we agree the statement was too strong. We carefully reconsider the discussion and believe that no such overstatement was present.

      (2) Hypothalamus nuclei are connected to multiple cortical (and subcortical) structures. The relevance of these projections will vary with the cognitive task considered. In addition, we have not yet considered the cortex in our analyses such that truly integrating cortical structures appears premature. 

      We nevertheless added the following short statement (PAGE 11): “Subcortical structures, and particularly those receiving direct retinal projections, including those of the hypothalamus, are likely to receive light illuminance signal first before passing on the light modulation to the cortical regions involved in the ongoing cognitive process (Campbell et al., 2023).”

      (3) We now include the following as part of the method section (PAGES 16-17): “Illuminance and spectra could not be directly measured within the MRI scanner due to the ferromagnetic nature of measurement systems. The coil of the MRI and the light stand, together with the lighting system were therefore placed outside of the MR room to reproduce the experimental conditions of the in a completely dark room. A sensor was placed 2 cm away from the mirror of the coil that is mounted at eye level, i.e. where the eye of the first author of the paper would be positioned, to measure illuminance and spectra. The procedure was repeated 4 times for illuminance and twice for spectra and measurements were averaged. This procedure does not take into account interindividual variation in head size and orbit shape such that the reported illuminance levels may have varied slightly across subjects. The relative differences between illuminance are, however, very unlikely to vary substantially across participants such that statistics consisting of tests for the impact of relative differences in illuminance were not affected. The detailed values reported in Supplementary Table 2 were computed combining spectra and illuminance using the excel calculator associated with a published work (Lucas et al., 2014).”

      (4) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      (5) The manuscript was thoroughly rechecked, and we hope to have spotted all typos and language errors.

      Reviewer #3 (Public Review): 

      Summary: 

      Campbell and colleagues use a combination of high-resolution fMRI, cognitive tasks, and different intensities of light illumination to test the hypothesis that the intensity of illumination differentially impacts hypothalamic substructures that, in turn, promote alterations in arousal that affect cognitive and affective performance. The authors find evidence in support of a posterior-to-anterior gradient of increased blood flow in the hypothalamus during task performance that they later relate to performance on two different tasks. The results provide an enticing link between light levels, hypothalamic activity, and cognitive/affective function, however, clarification of some methodological choices will help to improve confidence in the findings. 

      Strengths: 

      * The authors' focus on the hypothalamus and its relationship to light intensity is an important and understudied question in neuroscience. 

      Weaknesses: 

      (1) I found it challenging to relate the authors' hypotheses, which I found to be quite compelling, to the apparatus used to test the hypotheses - namely, the use of orange light vs. different light intensities; and the specific choice of the executive and emotional tasks, which differed in key features (e.g., block-related vs. event-related designs) that were orthogonal to the psychological constructs being challenged in each task. 

      (4) Given the small size of the hypothalamus and the irregular size of the hypothalamic parcels, I wondered whether a more data-driven examination of the hypothalamic time series would have provided a more parsimonious test of their hypothesis. 

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors may wish to explain the importance of the orange light condition in the early section of the results -- i.e., when they first present the task structure. As it stands, I don't have a good appreciation of why the orange light was included -- was it a control condition? And if the differences between the light conditions (e.g., the narrow- vs. wide-band of light) were indeed ignored by focussing on the illuminance levels, are there any potential issues that the authors could then mitigate against with further experiments/analyses? 

      (2) Are there other explanations for why illuminance levels might improve cognitive performance? For instance, the capacity to more easily perceive the stimuli in an experiment could plausibly make it easier to complete a given task. If this is the case, can the authors conceptualise a way to rule out this hypothesis? 

      (3) Did the authors control for the differences in the number of voxels in each hypothalamic subregion? Or perhaps consider estimating the variance across voxels within the larger parcels, to determine whether the mean time series was comparable to the time series of the smaller parcels? 

      (4) An alternative strategy that would mitigate against the differences in the size of hypothalamic parcels would be to conduct analyses on the hypothalamus without parcellation, but instead using dimensionality reduction techniques to observe the natural spread of responses across the hypothalamus. From the authors' results, my intuition is that these analyses will lead to similar conclusions, albeit without any of the potential issues with respect to differently-sized parcels. 

      We thank the reviewer for acknowledging the originality and interest of our study. We agree that some methodological choices needed more explanation. We will address the weaknesses they pointed out as follows:

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGE 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      We further mention that (PAGE 13): “Furthermore, we cannot exclude that colour and/or spectral differences between the orange and 3 blue-enriched light conditions may have contributed to our findings. Research in rodent model demonstrated that variation in the spectral composition of light was perceived by the suprachiasmatic nucleus to set circadian timing (Walmsley et al., 2015). No such demonstration has, however, been reported yet for the acute impact of light on alertness, attention, cognition or affective state.”

      Regarding the choice of tasks, we added the following the method section (PAGE 18): “Prior work of our team showed that the n-back task and emotional task included in the present protocol were successful probes to demonstrate that light illuminance modulates cognitive activity, including within subcortical structures (though resolution did not allow precise isolation of nuclei or subparts) (e.g. (Vandewalle et al., 2007, 2010)). When taking the step of ultra-high-field imaging, we therefore opted for these tasks as our goal was to show that illuminance affects brain activity across cognitive domains while not testing for task-specific aspects of these domains.”

      We further added to the discussion (PAGE 8): “The pattern of light-induced changes was consistent across an executive and an emotional task which consisted of block and an event-related fMRI design, respectively. This suggests that a robust anterior-posterior gradient of activity modulation by illuminance is present in hypothalamus across cognitive domains.”

      (2) We are unsure what the reviewer refers to when he states that the experiment could make it easier to perceive a stimulus. Aside from the fact that illuminance can increase alertness and attention such that a stimulus may be better or more easily perceived/processed, we do not see how blocks of ambient light, i.e. a long-lasting visual stimulus, may render auditory stimulation (letters or pseudo-words in the present) easier to perceive. To our knowledge multimodal or cross-modal integration has been robustly demonstrated for short visual/auditory cues that would precede or accompany auditory/visual stimulation. 

      We are willing to clarify this issue in the text if we receive additional explanation from the reviewer.

      (3) We added subpart size as covariate in the analyses (instead of subpart number) and it did not affect the output of the statistical analyses (Author response table 1). 

      For completeness, we further computed standard deviation of the activity estimates of the voxels within each parcel for the main analysis of the n-back tasks and found a main effect of subpart (Author response table 2) indicating that the variability of the estimates varied across subparts. Post hoc contrast and the display included in Author response image1 show however that the difference were not related to subpart size per see. It is in fact the largest subpart (subpart 4) that shows the largest variability while one of the smallest subpart (subpart 2) shows the lowest variability. Though it may have contributed, it is therefore unlikely to explain our findings. We consider the analyses reported in (Author response table 1 and 2 and (Author response image 1 as very technical and did not include it in the supplementary material for conciseness. If the reviewer judges it essential, we can reconsider our decision.  

      While computing these analyses, we realized that there were errors in the table 1 reporting the statistical outcomes of the main analyses of the emotional task. The main statistical outputs remain the same except for a nominal main effect of the task (emotional vs. neutral) and the fact that post hoc show a consistent difference between the posterior subpart (subpart 3) and all the other subparts, rather than all the other subparts except for the difference with superior tubular hypothalamus subpart: p-corrected = 0.09. We apologise for this slight error and were unable to isolate its origin. It does not modify the rest of the analyses (which were also rechecked) and the interpretations. 

      Author response table 1.

      Recomputations of the main GLMMs using subpart sizes rather than subpart numbers as covariate of interest.

      Author response image 1.

      Activity estimate variability per hypothalamus subpart and subpart size.  

      Author response table 2.

      Difference in activity estimate standard deviation between hypothalamus subparts during the n-back task.

      Outputs of the generalized linear mixed model (GLMM) with subject as the random factor (intercept and slope), and task and subpart as repeated measures (ar(1) autocorrelation).

      * The corrected p-value for multiple comparisons over 2 tests is p < 0.025.

      # Refer to Fig.2A for correspondence of subpart numbers

      The text referring to Table 1 was modified accordingly (PAGE 5): “A nominal main effect of the task was detected for the emotional task [p = 0.049; Table 1] but not for the n-back task. For both tasks, there was no significant main effect for any of the other covariates and post hoc analyses showed that the index of the illuminance impact was consistently different in the posterior hypothalamus subpart compared to the other subparts [pcorrected ≤ 0.05]”.

      (4) We agree that a data driven approach could have constituted an alternative means to tests our hypothesis. We opted for an approach that we mastered best, while still allowing to conclusively test for regional differences in activity across the hypothalamus. Examination of time series of the very same data we used will mainly confirm the results of our analyses – an anterior-posterior gradient in the impact of illuminance - while it may yield slight differences in the boarders of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance. While the suggested approach may have been envisaged if we had been facing negative results (i.e. no differences between subparts, potentially because subparts would not reflect functional differences in response to illuminance change), it would constitute a circular confirmation of our main findings (i.e. using the same data). While we truly appreciate the suggestion, we do not consider that it would constitute a more parsimonious test of our hypothesis, now that we successfully applied GLM/parcellation and GLMM approaches.

      We added the following statement to the discussion to take this comment into account (PAGE 12): “Future research may consider data-driven analyses of hypothalamus voxels time series as an alternative to the parcellation approach we adopted here. This may refine the delineation of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance.”

      Response references

      Albers, H. E., Walton, J. C., Gamble, K. L., McNeill, J. K., & Hummer, D. L. (2017). The dynamics of GABA signaling: Revelations from the circadian pacemaker in the suprachiasmatic nucleus. Frontiers in Neuroendocrinology, 44, 35–82. https://doi.org/10.1016/J.YFRNE.2016.11.003

      Bano-Otalora, B., Martial, F., Harding, C., Bechtold, D. A., Allen, A. E., Brown, T. M., Belle, M. D. C., & Lucas, R. J. (2021). Bright daytime light enhances circadian amplitude in a diurnal

      mammal. Proceedings of the National Academy of Sciences of the United States of America, 118(22), e2100094118. https://doi.org/10.1073/PNAS.2100094118/SUPPL_FILE/PNAS.2100094118.SAPP.PDF

      Campbell, I., Sharifpour, R., & Vandewalle, G. (2023). Light as a Modulator of Non-Image-Forming Brain Functions Positive and Negative Impacts of Increasing Light Availability. Clocks & Sleep, 5(1), 116. https://doi.org/10.3390/CLOCKSSLEEP5010012

      Chellappa, S. L., Ly, J. Q. M., Meyer, C., Balteau, E., Degueldre, C., Luxen, A., Phillips, C., Cooper, H. M., & Vandewalle, G. (2014). Photic memory for executive brain responses. Proceedings of the National Academy of Sciences of the United States of America, 111(16), 6087–6091. https://doi.org/10.1073/pnas.1320005111

      Dijk, D. J., Duffy, J. F., Silva, E. J., Shanahan, T. L., Boivin, D. B., & Czeisler, C. A. (2012). Amplitude reduction and phase shifts of melatonin, cortisol and other circadian rhythms after a gradual advance of sleep and light exposure in humans. PloS One, 7(2). https://doi.org/10.1371/JOURNAL.PONE.0030037

      Güler, A. D., Ecker, J. L., Lall, G. S., Haq, S., Altimus, C. M., Liao, H. W., Barnard, A. R., Cahill, H., Badea, T. C., Zhao, H., Hankins, M. W., Berson, D. M., Lucas, R. J., Yau, K. W., & Hattar, S. (2008). Melanopsin cells are the principal conduits for rod-cone input to non-image-forming vision. Nature, 453(7191), 102–105. https://doi.org/10.1038/nature06829

      Lucas, R. J., Peirson, S. N., Berson, D. M., Brown, T. M., Cooper, H. M., Czeisler, C. A., Figueiro, M. G., Gamlin, P. D., Lockley, S. W., O’Hagan, J. B., Price, L. L. A., Provencio, I., Skene, D. J., & Brainard, G. C. (2014). Measuring and using light in the melanopsin age. Trends in Neurosciences, 37(1), 1–9. https://doi.org/10.1016/j.tins.2013.10.004

      Milosavljevic, N., Cehajic-Kapetanovic, J., Procyk, C. A., & Lucas, R. J. (2016). Chemogenetic Activation of Melanopsin Retinal Ganglion Cells Induces Signatures of Arousal and/or Anxiety in Mice. Current Biology, 26(17), 2358–2363. https://doi.org/10.1016/j.cub.2016.06.057

      Sonoda, T., Li, J. Y., Hayes, N. W., Chan, J. C., Okabe, Y., Belin, S., Nawabi, H., & Schmidt, T. M. (2020). A noncanonical inhibitory circuit dampens behavioral sensitivity to light. Science (New York, N.Y.), 368(6490), 527–531. https://doi.org/10.1126/SCIENCE.AAY3152

      Tri, M., & Do, H. (2019). Melanopsin and the Intrinsically Photosensitive Retinal Ganglion Cells: Biophysics to Behavior. Neuron, 104, 205–226. https://doi.org/10.1016/j.neuron.2019.07.016

      Vandewalle, G., Hébert, M., Beaulieu, C., Richard, L., Daneault, V., Garon, M. Lou, Leblanc, J., Grandjean, D., Maquet, P., Schwartz, S., Dumont, M., Doyon, J., & Carrier, J. (2011). Abnormal hypothalamic response to light in seasonal affective disorder. Biological Psychiatry, 70(10), 954–961. https://doi.org/10.1016/j.biopsych.2011.06.022

      Vandewalle, G., Schmidt, C., Albouy, G., Sterpenich, V., Darsaud, A., Rauchs, G., Berken, P. Y., Balteau, E., Dagueldre, C., Luxen, A., Maquet, P., & Dijk, D. J. (2007). Brain responses to violet, blue, and green monochromatic light exposures in humans: Prominent role of blue light and the brainstem. PLoS ONE, 2(11), e1247. https://doi.org/10.1371/journal.pone.0001247

      Vandewalle, G., Schwartz, S., Grandjean, D., Wuillaume, C., Balteau, E., Degueldre, C., Schabus, M., Phillips, C., Luxen, A., Dijk, D. J., & Maquet, P. (2010). Spectral quality of light modulates emotional brain responses in humans. Proceedings of the National Academy of Sciences of the United States of America, 107(45), 19549–19554. https://doi.org/10.1073/pnas.1010180107

      Viénot, F., Brettel, H., Dang, T.-V., & Le Rohellec, J. (2012). Domain of metamers exciting intrinsically photosensitive retinal ganglion cells (ipRGCs) and rods. Journal of the Optical Society of America A, 29(2), A366. https://doi.org/10.1364/josaa.29.00a366

      Walmsley, L., Hanna, L., Mouland, J., Martial, F., West, A., Smedley, A. R., Bechtold, D. A., Webb, A. R., Lucas, R. J., & Brown, T. M. (2015). Colour As a Signal for Entraining the Mammalian Circadian Clock. PLOS Biology, 13(4), e1002127. https://doi.org/10.1371/journal.pbio.1002127

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This paper examines the role of MLCK (myosin light chain kinase) and MLCP (myosin light chain phosphatase) in axon regeneration. Using loss-of-function approaches based on small molecule inhibitors and siRNA knockdown, the authors explore axon regeneration in cell culture and in animal models. Their evidence shows that MLCK activity facilitates axon extension/regeneration, while MLCP prevents it.

      Major concern:

      A global inconsistency in the conclusions of the authors is evident when trying to understand the role of NMII in axon growth and to understand the present results in light of previous reports by the authors and many others on the role of NMII in axon extension. The discussion of the matter fails to acknowledge a vast literature on how NMII activity is regulated. The authors study enzymes responsible for the phosphorylation and dephosphorylation of NMII, referring to something that is strongly proven elsewhere, that phosphorylation activates NMII and dephosphorylation deactivates it. The authors mention their own previous evidence using inhibitors of NMII ATPase activity (blebbistatin, Bleb for short) and inhibitors of a kinase that phosphorylates NMII (ROCK), highlighting that Bleb increases axon growth. Since Bleb inhibits the ATPase activity of NMII, it follows that NMII is in itself an inhibitor of axon growth, and hence when NMII is inhibited, the inhibition on axon growth is relieved, and axonal growth takes place (REF1). It is known that NMII exists in an inactive folded state, and ser19 phosphorylation (by MLCK or ROCK) extends the protein, allowing NMII filament formation, ATPase activity, and force generation on actin filaments (REF2). From this, it is derived that if MLCK is inhibited, then there is no NMII phosphorylation, and hence no NMII activity, and, according to their previous work, this should promote axon growth. On the contrary, the authors show the opposite effect: in the lack of phospho-MLC, authors show axon growth inhibition.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the comments from the reviewer. We have tried our best to revise the manuscript to address all the comments raised by the Reviewer.

      Reporting evidence challenging previous conclusions is common business in scientific endeavors, but the problem with the current manuscript is that it fails to point to and appropriately discuss this contradiction. Instead, the authors refer to the fact that MLCK and Bleb inhibit NMII in different steps of the activation process. While this is true, this explanation does not solve the contradiction. There are many options to accommodate the information, but it is not the purpose of this revision to provide them. Since the manuscript is focused solely on phosphorylation states of MLC and axon extension, the claims are simply at odds with the current literature, and this important finding, if true, is not properly discussed.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line 373-374).

      What follows is a discussion of the merits and limitations of different claims of the manuscript in light of the evidence presented.

      (1) Using western blot and immunohistochemical analyses, authors first show that MLCK expression is increased in DRG sensory neurons following peripheral axotomy, concomitant to an increase in MLC phosphorylation, suggesting a causal effect (Figure 1). The authors claim that it is common that axon growth-promoting genes are upregulated. It would have been interesting at this point to study in this scenario the regulation of MLCP, which is a main subject in this work, and expect its downregulation.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      (2) Using DRG cultures and sciatic nerve crush in the context of MLCK inhibition and down-regulation, authors conclude that MLCK activity is required for mammalian peripheral axon regeneration both in vitro and in vivo (Figure 2).

      The in vitro evidence is of standard methods and convincing. However, here, as well as in all other experiments using siRNAs, it is not clear what the control is about (the identity of the plasmids and sequences, if any).

      We used the pCMV–EGFP–N3 as control, and the pCMV–EGFP–N3 plasmid was from Clontech, Inc. (line 114-115).   

      Related to this, it is not helpful to show the same exact picture as a control example in Figures 2 and 3 (panels J and E, respectively). Either because they should not have received the same control treatment, or simply because it raises concern that there are no other control examples worth showing. In these images, it is not also clear where and how the crush site is determined in the GFP channel. This is of major importance since the axonal length is measured from the presumed crush site. Apart from providing further details in the text, the authors should include convincing images.

      Thank you so much for your comments. We changed the control example in Figure 3J. For sciatic nerve regeneration experiments, the sciatic nerve was exposed at the sciatic notch by a small incision 2 days after the in vivo electroporation. The nerve was then crushed, and the crush site was marked with a 11-0 nylon epineural suture. After surgeries, the wound was closed, and the mice were allowed to recover. Three days after the sciatic nerve crush, the whole sciatic nerves from the perfused animals were dissected out and postfixed overnight in 4% PFA at 4°C. Before whole-mount flattening, it was confirmed that the place of epineural suture matched the injury site, and experiments were included in the analysis only when the crush site was clearly identifiable. Using whole mounted tissue, all identifiable EGFP-labeled axons in the sciatic nerve were manually traced from the crush site to the distal growth cone to measure the length of axon regeneration. (line 159-164).

      (3) The authors then examined the role of the phosphatase MLCP in axon growth during regeneration. The authors first use a known MLCP blocker, phorbol 12,13-dibutyrate (PDBu), to show that is able to increase the levels of p-MLC, with a concomitant increase in the extent of axon regrowth of DRG neurons, both in permissive as well as non-permissive. The authors repeat the experiments using the knockdown of MYPT1, a key component of the MLC-phosphatase, and again can observe a growth-promoting effect (Figure 3).

      The authors further show evidence for the growth-enhancing effect in vivo, in nerve crush experiments. The evidence in vivo deserves more evidence and experimental details (see comment 2). Some key weaknesses of the data were mentioned previously (unclear RNAi controls and duplication of shown images), but in this case, it is also not clear if there is a change only in the extent of growth, or also in the number of axons that are able to regenerate.

      Thank you so much for your comments. We used same control as in vitro experiments (the pCMV– EGFP–N3 plasmid was from Clontech, Inc), and we also changed the control image in Figure 3J. For in vivo axon regeneration experiments, we measured the lengths of all identifiable EGFP-labelled axons in the sciatic nerve from the crush site to the distal axonal ends. The number of EGFP labeled regenerating axons were actually determined by the electroporation rate of EGFP, which is similar, but not identical, in different mice. Thus, our data only can show the differences in axon lengths among different experimental conditions. Such approach has been used in many of our previously published papers (e.g. Saijilafu et al. Nature Communications, 2011, Saijilafu et al. Nature Communications, 2013). (line 152-153).

      (4) In the next set of experiments (presented in Figure 4) authors extend the previous observations in primary cultures from the CNS. For that, they use cortical and hippocampal cultures, and pharmacological and genetic loss-of-function using the above-mentioned strategies. The expected results were obtained in both CNS neurons: inhibition or knockdown of the kinase decreases axon growth, whereas inhibition or knockdown of the phosphatase increases growth. A main weakness in this set is that it is not indicated when (at what day in vitro, DIV) the treatments are performed. This is important to correctly interpret the results, since in the first days in vitro these neurons follow well-characterized stages of development, with characteristic cellular events with relevance to what is being evaluated. Importantly, this would be of value to understand whether the treatments affect axonal specification and/or axonal extension. Although these events are correlated, they imply a different set of molecular events.

      The treatments were started from the initial of cell culture period, and this procedure may affect axon specification as the Reviewer point out. However, we mainly focused on axon length in our experiments, thus, for quantification of axon length, neurons with processes longer than twice the diameter of cell bodies were photographed, and the longest axon of each neuron was measured. We revised the manuscript as suggested by the reviewer (line 143-145).

      The title of this section is misleading: line 241 "MLCK/MLCP activity regulated axon growth in the embryonic CNS"... the title (and the conclusion) implies that the experiments were performed in situ, looking at axons in the developing brain. The most accurate title and conclusion should mention that the evidence was collected in CNS primary cultures derived from embryos.

      We have revised the manuscript as suggested by the reviewer (line 251).

      (5) Performing nerve crush injury in CNS nerves (optic nerve and spinal cord), and the local application of PBDu, the author shows contrasting results (Figure 5). In the ON nerve, they can see axons extending beyond the lesion site due to PBDu. On the contrary, the authors fail to observe so in the corticospinal tract present in the spinal cord. The authors fail to discuss this matter in detail. Also, they accommodate the interpretation of the evidence in light of a process known as axon retraction, and its prevention by MLCP inhibition. Since the whole paper is on axon extension, and it is known that mechanistically axon retraction is not merely the opposite of axon extension, the claim needs far more evidence.

      Thank you so much for your comments. Compared to optic nerve axons, corticospinal tract axons exhibit a reduced intrinsic axon growth capability. Consequently, we observed that PBDu stimulates optic nerve axon regeneration. However, unfortunately, we did not detect any enhancement in corticospinal tract axons beyond the injury site in SCI following the inhibition of myosin light chain phosphatase (MLCP) with PBDu.

      In panel 5F and the supplementary data, the authors mention the occurrence of retraction bulbs, but the images are too small to support the claim, and it is not clear how these numbers were normalized to the number of axons labeled in each condition.

      Thank you so much for your comments. In this study, we used a similar method from Ertürk et al. (2007) to quantify the retraction bulb. Both maximum width of the enlarged distal tip of the axon and the width of its immediately adjacent axon shaft was measured. Then, the ratio of these two widths was then calculated. An axonal tip was considered as a retraction bulb if its tip/shaft ratio exceeded 4. Averages number of retraction bulb were calculated from 3 sections in every mice for each group (n=5). (line 187-191).

      [Ref] Ertürk A, Hellal F, Enes J, and Bradke F (2007). Disorganized microtubules underlie the formation of retraction bulbs and the failure of axonal regeneration. J. Neurosci 27, 9169–9180. [PubMed:17715353].

      (6) The author combines MLCK and MLCP inhibitors with Bleb, trying to verify if both pairs of inhibitors act on the same target/pathway (Figure 6). The rationale is wrong for at least two reasons.<br /> a- Because both lines of evidence point to contrasting actions of NMII on axon growth, one approach could never "rescue" the other.

      If MLCK regulates axon growth through the activation of Myosin, the inhibitory effect of ML-7 (an MLCK inhibitor) on axon growth might be influenced by Bleb, a NMII inhibitor. However, our findings reveal that the combination of Bleb and ML-7 does not alter the rate of axon outgrowth compared to ML-7 alone. This suggests that the roles of ML-7 and Bleb in axon growth are independent. It means MLCK may regulates axon growth independent of NMII activity.

      b. Because the approaches target different steps on NMII activation, one could never "prevent" or rescue the other. For example, for Bleb to provide a phenotype, it should find any p-MLC, because it is only that form of MLC that is capable of inhibiting its ATPase site. In light of this, it is not surprising that Bleb is unable to exert any action in a situation where there is no p-MLC (ML-7, which by inhibiting the kinase drives the levels of p-MLC to zero, Figure 4A). Hence, the results are not possible to validate in the current general interpretation of the authors. (See 'major concern').

      The reported mechanism of blebbistatin is not through competition with the ATP binding site of myosin. Instead, it selectively binds to the ATPase intermediate state associated with ADP and inorganic phosphate, which decelerates the phosphate release. Importantly, blebbistatin does not impede myosin's interaction with actin or the ATP-triggered disassociation of actomyosin. It rather inhibits the myosin head when it forms a product complex with a reduced affinity for actin. This indicates that blebbistatin functions by stabilizing a particular myosin intermediate state that is independent of the phosphorylation status of myosin light chain (MLC).

      [Ref] Kovács M, Tóth J et al. Mechanism of blebbistatin inhibition of myosin II. J Biol Chem. 2004 Aug 20;279(34):35557-63. doi: 10.1074/jbc.M405319200.

      (7) In Figure 7, the authors argue that the scheme of replating and using ML7 before or after replating is evidence for a local cytoskeletal action of the drug. However, an alternative simpler explanation is that the drug acts acutely on its target, and that, as such, does not "survive" the replating procedure. Hence, the conclusion raised by the evidence shown is not supported.

      In our study, we meticulously assessed the neuronal survival rates across various experimental groups. The findings indicate no significant variation in survival rates among the groups. This suggests that the drug treatment exerts no discernible influence on cell viability but primarily modulates axonal elongation."

      Author response image 1.

      (8) In Figure 8, the authors show that the inhibitory treatments on MLCK and MLCP (ML7 and PRBu) alter the morphology of growth cones. However, it is not clear how this is correlated with axon growth. The authors also mention in various parts of the text that a local change in the growth cone is evidence for a local action/activity of the drug or enzyme. However, the local change<->local action is not a logical truth. It can well be that MLCK and MLCP activity trigger molecular events that ultimately have an effect elsewhere, and by looking at "elsewhere" one observes of course a local effect but is not because the direct action of MLCK or MLCP are localized. To prove true localized effects there are numerous efforts that can be made, starting from live imaging, fluorescent sensors, and compartmentalized cultures, just to mention a few.

      About the relationship between growth cone size and its growth rate, the previous published literatures found that a fast-growing axon tended to have small growth cones (Mason C. et al. 1997). A recent study on Aplysia further supports this by noting that growth cones enlarge significantly when axonal elongation halts (Miller and Suter, 2018). Consistent with these findings, our data indicate that inhibiting MLCP with PDBu treatment leads to a reduction in growth cone size, which in turn promotes axon regeneration.

      [Ref] Mason CA, Wang LC. Growth cone form is behavior-specific and, consequently, position-specific along the retinal axon pathway. J Neurosci. 1997; 13:1086–1100. [PubMed: 8994063]

      [Ref] Miller KE, Suter DM. An Integrated Cytoskeletal Model of Neurite Outgrowth. Front Cell Neurosci. 2018 Nov 26;12:447. doi: 10.3389/fncel.2018.00447. eCollection 2018.

      References:

      (1) Eun-Mi Hur 1, In Hong Yang, Deok-Ho Kim, Justin Byun, Saijilafu, Wen-Lin Xu, Philip R Nicovich, Raymond Cheong, Andre Levchenko, Nitish Thakor, Feng-Quan Zhou. 2011. Engineering neuronal growth cones to promote axon regeneration over inhibitory molecules. Proc Natl Acad Sci U S A. 2011 Mar 22;108(12):5057-62. doi: 10.1073/pnas.1011258108.

      (2) Garrido-Casado M, Asensio-Juárez G, Talayero VC, Vicente-Manzanares M. 2024. Engines of change: Nonmuscle myosin II in mechanobiology. Curr Opin Cell Biol. 2024 Apr;87:102344. doi: 10.1016/j.ceb.2024.102344.

      (3) Karen A Newell-Litwa 1, Rick Horwitz 2, Marcelo L Lamers. 2015. Non-muscle myosin II in disease: mechanisms and therapeutic opportunities. Dis Model Mech. 2015 Dec;8(12):1495-515. doi: 10.1242/dmm.022103.

      Reviewer #2 (Public review):

      Summary:

      Saijilafu et al. demonstrate that MLCK/MLCP proteins promote axonal regeneration in both the central nervous system (CNS) and peripheral nervous system (PNS) using primary cultures of adult DRG neurons, hippocampal and cortical neurons, as well as in vivo experiments involving sciatic nerve injury, spinal cord injury, and optic nerve crush. The authors show that axon regrowth is possible across different contexts through genetic and pharmacological manipulation of these proteins. Additionally, they propose that MLCK/MLCP may regulate F-actin reorganization in the growth cone, which is significant as it suggests a novel strategy for promoting axonal regeneration.

      Strengths:

      This manuscript presents a comprehensive array of experimental models, addressing the biological question in a broad manner. Particularly noteworthy is the use of multiple in vivo models, which significantly strengthens the overall validity of the study.

      We thank the Reviewer for taking time to review our manuscript, and we really appreciated the positive comments from the Reviewer.

      Weaknesses:

      The following aspects apply:

      (1) The manuscript initially references prior research by the authors suggesting that NMII inhibition enhances axonal growth and that MLCK activates NMII. However, the study introduces a contradiction by demonstrating that MLCK inhibition (via ML-7 or siMLCK) inhibits axonal growth. This inconsistency is not adequately addressed or discussed in the manuscript.

      Thank you for reviewer's very good comments. As suggested by Reviewer, we discuss more detail it in our revised manuscripts (line 357-368; line373-374).

      (2) While the study proposes that MLCK/MLCP regulates F-actin redistribution in the growth cone, the mechanism is not explored in depth. The only figure showing how pharmacological manipulation affects the growth cone suggests that not only F-actin but also the microtubule cytoskeleton might be affected, indicating that the mechanism may not be specific. A deeper exploration of this relationship in DRG neurons, in addition to cortical neurons, as shown in the study, would be beneficial.

      Thank you for your insightful suggestion. However, our study primarily focuses on actin and myosin dynamics in the context of axonal elongation, as indicated by our direct observations in growing dorsal root ganglia (DRGs). Athamneh et al. (2017) elegantly demonstrated that the bulk movement of microtubules (MTs), rather than their assembly, predominantly drives MT advance during axonal elongation. Consequently, our manuscript concentrates on the actomyosin system, which is central to our findings. While the role of MTs in axonal growth is indeed significant and fascinating, the data we present is predominantly concerned with the actomyosin mechanism.

      [Ref] Athamneh, A. I. M. et al. Neurite elongation is highly correlated with bulk forward translocation of microtubules. Scientific Reports 7, (2017).

      (3) In the sciatic nerve injury experiments, it would be crucial to include additional controls that clearly demonstrate that siMYPT1 treatment increases MLCP in the L4-L5 ganglia. Additionally, although the manuscript mentions quantifying axons expressing EGFP, the Materials and Methods section only discusses siMYPT1 electroporation, which could lead to confusion.

      Thank you for your suggestion. However, due to the unavailability of a suitable commercial MLCP antibody, we were unable to directly detect MLCP expression. Instead, we assessed the phosphorylation level of myosin light chain (MLC) as a proxy to indicate that siMYPT1 transfection effectively downregulates MLCP activity in L4/5 dorsal root ganglia (DRG). This approach was taken to ensure the integrity of our findings despite the limitations in antibody availability.

      About the electroporation method section, we have now included detailed information about the control plasmid used in our experiments to ensure a clear understanding of our experimental setup and to validate our results. A 1 μl solution containing indicated siRNAs together with the plasmid encoding EGFP (pCMV–EGFP–N3) was then microinjected into the L4–L5 DRG….. (line 152-153).

      (4) In some panels, it is difficult to differentiate the somas from the background (Figures 3, 4, 7). In conditions where images with shorter axonal lengths are represented, it is unclear whether this is due to fewer cells or reduced axonal growth (Figures 2, 4, 6).

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are a number of typos and language errors that should be thoroughly revised. For example, line 219: "It is well known that the opposite role of MLCK and MLCP to regulate the MLC phosphorylation status". The term "opposite role" is vague. Using "opposite roles" and specifying that they are in regulating MLC phosphorylation status clarifies the relationship between MLCK and MLCP. Also, the original phrase "to regulate" was not correctly integrated into the sentence. Rephrasing it to "in regulating" makes the role of MLCK and MLCP clearer.

      We have revised the manuscript as suggested by the reviewer (line 229).

      In the same line, there is a high number of panels that are not referred to in the text or references for panels that have another letter. Just to mention a few:

      - line 199: "(Figure 1F, G)", → BUT figure 1 contains no G panel.

      We have revised the manuscript as suggested by the reviewer (line 209).

      - line 203: "The results showed that ML-7 administration led to a significant reduction in MLC phosphorylation levels (Figure 2A, B) and impaired axonal growth in sensory neurons (Figure 2C, D). → BUT panel C is related to A and B, and only D and E show impaired axonal growth.

      We have revised the manuscript as suggested by the reviewer (line 214; line 215; line 217; line 219 ).

      Reviewer #2 (Recommendations for the authors):

      (1) Improving the quality of the images would significantly strengthen the results presented.

      In the original submission, there was some loss of image quality while converting the TIFF to PDF. We improved the quality of images in our revised manuscripts.

      (2) The representative images of controls do not always show the same number of cells or axonal growth (e.g., Figure 4).

      We have changed some images as suggested by the reviewer.

      (3) The text has citation errors when referring to the figure labels.

      Upon thorough review, we have carefully examined our manuscript and have made the necessary corrections to address the identified errors. We appreciate the opportunity to enhance the quality of our work and believe that these revisions have significantly improved the clarity of our manuscript.

      (4) What happens to MLCK levels when MLCP activity is inhibited in the optic nerve?

      Upon analyzing our experimental data, we observed no significant alterations in the protein levels of MLCK when the activity of MLCP was inhibited. This finding suggests that the regulatory mechanisms governing MLCK expression may not be directly influenced by short-term MLCP inhibition. It is plausible that the duration of the inhibition period was insufficient to elicit a detectable change in MLCK expression levels.

      (5) The text in line 266: "In contrast, local PBS administration at the injury site or intravitreal PDBu injection induced little axon regeneration beyond the injury site (Figure 5 A-C)." However, this is not reflected in the figure.

      In our revised manuscript, we have provided a more precise description of our findings: In contrast, local PBS administration at the injury site or intravitreal PDBu injection did not significantly enhance axon regeneration beyond the injury site (Figure 5 A-C). This observation suggests that the only treatment employed in the injury site (the inhibition of MLCP activity within the growth cone) effective promote axonal growth. (line 276-279).

      (6) Line 287: The phrase "Consistent with our previous study" requires a citation to support it.

      We added the reference paper; Consistent with our previous study 1, the inhibition of myosin II activity with 25 μM blebbistatin markedly promoted axonal growth (Figure 6A, B). (line 298)

      (7) Line 333: The paper cited by Yu P et al. (2012) does not mention MLCK or p-MLC, so it appears to be misquoted.

      Thank you for comments. We rechecked this cited paper and confirmed that the author provided the western data C in the supplementary figure 1, it showed that Bleb did not alter the phosphorylation status of MLC.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for their careful read of our paper, and appreciate the thoughtful comments.

      Both reviewers agreed that our work had several major strengths: the large dataset collected in collaboration across ten labs, the streamlined processing pipelines, the release of code repositories, the multi-task neural network, and that we definitively determined that electrode placement is an important source of variability between datasets.

      However, a number of key potential improvements were noted: the reviewers felt that a more standard model-based characterization of single neuron responses would benefit our reproducibility analysis, that more detail was needed about the number of cells, sessions, and animals, and that more information was needed to allow users to deploy the RIGOR standards and to understand their relationship to other metrics in the field.

      We agree with these suggestions and have implemented many major updates in our revised manuscript. Some highlights include:

      (1)  A new regression analysis that specifies the response profile of each neuron, allowing a comparison of how similar these are across labs and areas (See Figure 7 in the new section, “Single neuron coefficients from a regression-based analysis are rep oducible across labs”);

      (2) A new decoding analysis (See Figure 9 in the section, “Decodability of task variables is consistent across labs, but varies by brain region”);

      (3) A new RIGOR notebook to ease useability;

      (4) A wealth of additional information about the cells, animals and sessions in each figure;

      (5) Many new additional figure panels in the main text and supplementary material to clarify the specific points raised by the reviewers.

      Again, we are grateful to the reviewers and editors for their helpful comments, which have significantly improved the work. We are hopeful that the many revisions we have implemented will be sufficient to change the “incomplete” designation that was originally assigned to the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors explore a large-scale electrophysiological dataset collected in 10 labs while mice performed the same behavioral task, and aim to establish guidelines to aid reproducibility of results collected across labs. They introduce a series of metrics for quality control of electrophysiological data and show that histological verification of recording sites is important for interpreting findings across labs and should be reported in addition to planned coordinates. Furthermore, the authors suggest that although basic electrophysiology features were comparable across labs, task modulation of single neurons can be variable, particularly for some brain regions. The authors then use a multi-task neural network model to examine how neural dynamics relate to multiple interacting task- and experimenter-related variables, and find that lab-specific differences contribute little to the variance observed. Therefore, analysis approaches that account for correlated behavioral variables are important for establishing reproducible results when working with electrophysiological data from animals performing decision-making tasks. This paper is very well-motivated and needed. However, what is missing is a direct comparison of task modulation of neurons across labs using standard analysis practice in the fields, such as generalized linear model (GLM). This can potentially clarify how much behavioral variance contributes to the neural variance across labs; and more accurately estimate the scale of the issues of reproducibility in behavioral systems neuroscience, where conclusions often depend on these standard analysis methods.

      We fully agree that a comparison of task-modulation across labs is essential. To address this, we have performed two new analyses and added new corresponding figures to the main text (Figures 7 and 9). As the reviewer hoped, this analysis did indeed clarify how much behavioral variance contributes to the variance across labs. Critically, these analyses suggested that our results were more robust to reproducibility than the more traditional analyses would indicate.

      Additional details are provided below (See detailed response to R1P1b).

      Strengths:

      (1) This is a well-motivated paper that addresses the critical question of reproducibility in behavioural systems neuroscience. The authors should be commended for their efforts.

      (2) A key strength of this study comes from the large dataset collected in collaboration across ten labs. This allows the authors to assess lab-to-lab reproducibility of electrophysiological data in mice performing the same decision-making task.

      (3) The authors' attempt to streamline preprocessing pipelines and quality metrics is highly relevant in a field that is collecting increasingly large-scale datasets where automation of these steps is increasingly needed.

      (4) Another major strength is the release of code repositories to streamline preprocessing pipelines across labs collecting electrophysiological data.

      (5) Finally, the application of MTNN for characterizing functional modulation of neurons, although not yet widely used in systems neuroscience, seems to have several advantages over traditional methods.

      Thanks very much for noting these strengths of our work.

      Weaknesses:

      (1) In several places the assumptions about standard practices in the field, including preprocessing and analyses of electrophysiology data, seem to be inaccurately presented:

      a) The estimation of how much the histologically verified recording location differs from the intended recording location is valuable information. Importantly, this paper provides citable evidence for why that is important. However, histological verification of recording sites is standard practice in the field, even if not all studies report them. Although we appreciate the authors' effort to further motivate this practice, the current description in the paper may give readers outside the field a false impression of the level of rigor in the field.

      We agree that labs typically do perform histological verification. Still, our methods offer a substantial improvement over standard practice, and this was critical in allowing us to identify errors in targeting. For instance, we used new software, LASAGNA, which is an innovation over the traditional, more informal approach to localizing recording sites. Second, the requirement that two independent reviewers concur on each proposed location for a recording site is also an improvement over standard practice. Importantly, these reviewers use electrophysiological features to more precisely localize electrodes, when needed, which is an improvement over many labs. Finally, most labs use standard 2D atlases to identify recording location (a traditional approach); our use of a 3D atlas and a modern image registration pipeline has improved the accuracy of identifying the true placement of probes in 3D space.

      Importantly, we don’t necessarily advocate that all labs adopt our pipeline; indeed, this would be infeasible for many labs. Instead, our hope is that the variability in probe trajectory that we uncovered will be taken into account in future studies. Here are 3 example ways in which that could happen. First, groups hoping to target a small area for an experiment might elect to use a larger cohort than previously planned, knowing that some insertions will miss their target. Second, our observation that some targeting error arose because experimenters had to move probes due to blood vessels will impact future surgeries: when an experimenter realizes that a blood vessel is in the way, they might still re-position the probe, but they can also adjust its trajectory (e.g., changing the angle) knowing that even little nudges to avoid blood vessels can have a large impact on the resulting insertion trajectory. Third, our observation of a 7 degree deviation between stereotaxic coordinates and Allen Institute coordinates can be used for future trajectory planning steps to improve accuracy of placement. Uncovering this deviation required many insertions and our standardized pipeline, but now that it is known, it can be easily corrected without needing such a pipeline.

      We thank the reviewer for bringing up this issue and have added new text (and modified existing text) in the Discussion to highlight the innovations we introduced that allowed us to carefully quantify probe trajectory across labs (lines 500 - 515):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset. … Detecting this offset relied on a large cohort size and an automated histological pipeline, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Minimizing variance in probe targeting is another important element in increasing reproducibility, as slight deviations in probe entry position and angle can lead to samples from different populations of neurons. Collecting structural MRI data in advance of implantation could reduce targeting error, although this is infeasible for most labs. A more feasible solution is to rely on stereotaxic coordinates but account for the inevitable off-target measurements by increasing cohort sizes and adjusting probe angles when blood vessels obscure the desired location.”

      b) When identifying which and how neurons encode particular aspects of stimuli or behaviour in behaving animals (when variables are correlated by the nature of the animals behaviour), it has become the standard in behavioral systems neuroscience to use GLMs - indeed many labs participating in the IBL also has a long history of doing this (e.g., Steinmetz et al., 2019; Musall et al., 2023; Orsolic et al., 2021; Park et al., 2014). The reproducibility of results when using GLMs is never explicitly shown, but the supplementary figures to Figure 7 indicate that results may be reproducible across labs when using GLMs (as it has similar prediction performance to the MTNN). This should be introduced as the first analysis method used in a new dedicated figure (i.e., following Figure 3 and showing results of analyses similar to what was shown for the MTNN in Figure 7). This will help put into perspective the degree of reproducibility issues the field is facing when analyzing with appropriate and common methods. The authors can then go on to show how simpler approaches (currently in Figures 4 and 5) - not accounting for a lot of uncontrolled variabilities when working with behaving animals - may cause reproducibility issues.

      We fully agree with the reviewer's suggestion. We have addressed their concern by implementing a Reduced-Rank Regression (RRR) model, which builds upon and extends the principles of Generalized Linear Models (GLMs). The RRR model retains the core regression framework of GLMs while introducing shared, trainable temporal bases across neurons, enhancing the model’s capacity to capture the structure in neural activity (Posani, Wang, et al., bioRxiv, 2024). Importantly, Posani, Wang et al compared the predictive performance of GLMs vs the RRR model, and found that the RRR model provided (slightly) improved performance, so we chose the RRR approach here.

      We highlight this analysis in a new section (lines 350-377) titled, “Single neuron coefficients from a regression-based analysis are reproducible across labs”. This section includes an entirely new Figure (Fig. 7), where this new analysis felt most appropriate, since it is closer in spirit to the MTNN analysis that follows (rather than as a new Figure 3, as the reviewer suggested). As the reviewer hoped, this analysis provides some reassurance that including many variables when characterizing neural activity furnishes results with improved reproducibility. We now state this in the Results and the Discussion (line 456-457), highlighting that these analyses complement the more traditional selectivity analyses, and that using both methods together can be informative.

      When the authors introduce a neural network approach (i.e. MTNN) as an alternative to the analyses in Figures 4 and 5, they suggest: 'generalized linear models (GLMs) are likely too inflexible to capture the nonlinear contributions that many of these variables, including lab identity and spatial positions of neurons, might make to neural activity'). This is despite the comparison between MTNN and GLM prediction performance (Supplement 1 to Figure 7) showing that the MTNN is only slightly better at predicting neural activity compared to standard GLMs. The introduction of new models to capture neural variability is always welcome, but the conclusion that standard analyses in the field are not reproducible can be unfair unless directly compared to GLMs.

      In essence, it is really useful to demonstrate how different analysis methods and preprocessing approaches affect reproducibility. But the authors should highlight what is actually standard in the field, and then provide suggestions to improve from there.

      Thanks again for these comments. We have also edited the MTNN section slightly to accommodate the addition of the previous new RRR section (line 401-402).

      (2) The authors attempt to establish a series of new quality control metrics for the inclusion of recordings and single units. This is much needed, with the goal to standardize unit inclusion across labs that bypasses the manual process while keeping the nuances from manual curation. However, the authors should benchmark these metrics to other automated metrics and to manual curation, which is still a gold standard in the field. The authors did this for whole-session assessment but not for individual clusters. If the authors can find metrics that capture agreed-upon manual cluster labels, without the need for manual intervention, that would be extremely helpful for the field.

      We thank the reviewer for their insightful suggestions regarding benchmarking our quality control metrics against manual curation and other automated methods at the level of individual clusters. We are indeed, as the reviewer notes, publishing results from spike sorting outputs that have been automatically but not manually verified on a neuron-by-neuron basis. To get to the point where we trust these results to be of publishable quality, we manually reviewed hundreds of recordings and thousands of neurons, refining both the preprocessing pipeline and the single-unit quality metrics along the way. All clusters, both those passing QCs and those not passing QCs, are available to review with detailed plots and quantifications at https://viz.internationalbrainlab.org/app (turn on “show advanced metrics” in the upper right, and navigate to the plots furthest down the page, which are at the individual unit level). We would emphasize that these metrics are definitely imperfect (and fully-automated spike sorting remains a work in progress), but so is manual clustering. Our fully automated approach has the advantage of being fully reproducible, which is absolutely critical for the analyses in the present paper. Indeed, if we had actually done manual clustering or curation, one would wonder whether our results were actually reproducible independently. Nevertheless, it is not part of the present manuscript’s objectives to validate or defend these specific choices for automated metrics, which have been described in detail elsewhere (see our Spike Sorting whitepaper, https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_La boratory/19705522?file=49783080). It would be a valuable exercise to thoroughly compare these metrics against a careful, large, manually-curated set, but doing this properly would be a paper in itself and is beyond the scope of the current paper. We also acknowledge that our analyses studying reproducibility across labs could, in principle, result in more or less reproducibility under a different choice of metrics, which we now describe in the Discussion (line 469-470)”:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      (3) With the goal of improving reproducibility and providing new guidelines for standard practice for data analysis, the authors should report of n of cells, sessions, and animals used in plots and analyses throughout the paper to aid both understanding of the variability in the plots - but also to set a good example.

      We wholeheartedly agree and have added the number of cells, mice and sessions for each figure. This information is included as new tabs in our quality control spreadsheet (https://docs.google.com/spreadsheets/d/1_bJLDG0HNLFx3SOb4GxLxL52H4R2uPRcpUlIw6n4 n-E/). This is referred to in line 158-159 (as well as its original location on line 554 in the section, “Quality control and data inclusion”).

      Other general comments:

      (1) In the discussion (line 383) the authors conclude: 'This is reassuring, but points to the need for large sample sizes of neurons to overcome the inherent variability of single neuron recording'. - Based on what is presented in this paper we would rather say that their results suggest that appropriate analytical choices are needed to ensure reproducibility, rather than large datasets - and they need to show whether using standard GLMs actually allows for reproducible results.

      Thanks. The new GLM-style RRR analysis in Figure 7, following the reviewer’s suggestion, does indeed indicate improved reproducibility across labs. As described above, we see this new analysis as complementary to more traditional analyses of neural selectivity and argue that the two can be used together. The new text (line 461) states:

      “This is reassuring, and points to the need for appropriate analytical choices to ensure reproducibility.”

      (2) A general assumption in the across-lab reproducibility questions in the paper relies on intralab variability vs across-lab variability. An alternative measure that may better reflect experimental noise is across-researcher variability, as well as the amount of experimenter experience (if the latter is a factor, it could suggest researchers may need more training before collecting data for publication). The authors state in the discussion that this is not possible. But maybe certain measures can be used to assess this (e.g. years of conducting surgeries/ephys recordings etc)?

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) Figure 3b and c: Are these plots before or after the probe depth has been adjusted based on physiological features such as the LFP power? In other words, is the IBL electrophysiological alignment toolbox used here and is the reliability of location before using physiological criteria or after? Beyond clarification, showing both before and after would help the readers to understand how much the additional alignment based on electrophysiological features adjusts probe location. It would also be informative if they sorted these penetrations by which penetrations were closest to the planned trajectory after histological verification.

      The plots in Figure 3b and 3c reflect data after the probe depth has been adjusted based on electrophysiological features. This adjustment incorporates criteria such as LFP power and spiking activity to refine the trajectory and ensure precise alignment with anatomical landmarks. The trajectories have also been reviewed and confirmed by two independent reviewers. We have clarified this in line 180 and in the caption of Figure 3.

      To address this concern, we have added a new panel c in Figure 3 supplementary 1 (also shown below) that shows the LFP features along the probes prior to using the IBL alignment toolbox. We hope the reviewer agrees that a comparison of panels (a) and (c) below make clear the improvement afforded by our alignment tools.

      In Figure 3 and Figure 3 supplementary 1, as suggested, we have also now sorted the probes by those that were closest to the planned trajectory. This way of visualizing the data makes it clear that as the distance from the planned trajectory increases, the power spectral density in the hippocampal regions becomes less pronounced and the number of probes that have a large portion of the channels localized to VISa/am, LP and PO decreases. We have added text to the caption to describe this. We thank the reviewer for this suggestion and agree that it will help readers to understand how much the additional alignment (based on electrophysiological features) adjusts probe location.

      (4) In Figures 4 and 6: If the authors use a 0.05 threshold (alpha) and a cell simply has to be significant on 1/6 tests to be considered task modulated, that means that they have a false positive rate of ~30% (0.05*6=0.3). We ran a simple simulation looking for significant units (from random null distribution) from these criteria which shows that out of 100.000 units, 26500 units would come out significant (false error rate: 26.5%). That is very high (and unlikely to be accepted in most papers), and therefore not surprising that the fraction of task-modulated units across labs is highly variable. This high false error rate may also have implications for the investigation of the spatial position of task-modulated units (as effects of the spatial position may drown in falsely labelled 'task-modulated' cells).

      Thank you for this concern. The different tests were kept separate, so we did not consider a neuron modulated if it was significant in only one out of six tests, but instead we asked whether a neuron was modulated according to test one, whether it was modulated according to test two, etc., and performed further analyses separately for each test. Thus, we are only vulnerable to the ‘typical’ false positive rate of 0.05 for any given test. We made this clearer in the text (lines 232-236) and hope that the 5% false positive rate seems more acceptable.

      (5) The authors state from Figure 5b that the majority of cells could be well described by 2 PCs. The distribution of R2 across neurons is almost uniform, so depending on what R2 value one considers a 'good' description, that is the fraction of 'good' cells. Furthermore, movement onset has now been well-established to be affecting cells widely and in large fractions, so while this analysis may work for something with global influence - like movement - more sparsely encoded variables (as many are in the brain) may not be well approximated with this suggestion. The authors could expand this analysis into other epochs like activity around stimulus presentation, to better understand how this type of analysis reproduces across labs for features that have a less global influence.

      We thank the reviewer for the suggestion and fully agree that the window used in our original analysis would tend to favor movement-driven neurons. To address this, we repeated the analysis, this time using a window centered around stimulus onset (from -0.5 s prior to stimulus onset until 0.1 s after stimulus onset). As the reviewer suspected, far fewer neurons were active in this window and consequently far fewer were modelled well by the first two PCs, as shown in Author response image 1b (below). Similar to our original analysis using the post-movement window, we found mixed results for the stimulus-centered window across labs. Interestingly, regional differences were weaker in this new analysis compared to the original analysis of the post-movement window. We have added a sentence to the results describing this. Because the results are similar to the post-movement window main figure, we would prefer to restrict the new analysis only to this point-by-point response, in the hopes of streamlining the paper.

      Author response image 1.

      PCA analysis applied to a stimulus-aligned window ([-0.5, 0.1] sec relative to stim onset). Figure conventions as in main text Fig 5. Results are comparable to the post-movement window analysis, however regional differences are weaker here, possibly because fewer cells were active in the pre-movement window. We added panel j here and in the main figure, showing cell-number-controlled results. I.e. for each test, the minimum neuron number of the compared classes was sampled from all classes (say labs in a region), this sampling was repeated 1000 times and p-values combined via Fisher’s method, overall resulting in much fewer significant differences across laboratories and, independently, regions.

      (6) Additionally, in Figure 5i: could the finding that one can only distinguish labs when taking cells from all regions, simply be a result of a different number of cells recorded in each region for each lab? It makes more sense to focus on the lab/area pairing as the authors also do, but not to make their main conclusion from it. If the authors wish to do the comparison across regions, they will need to correct for the number of cells recorded in each region for each lab. In general, it was a struggle to fully understand the purpose of Figure 5. While population analysis and dimensionality reduction are commonplace, this seems to be a very unusual use of it.

      We agree that controlling for varying cell numbers is a valuable addition to this analysis. We added panel j in Fig. 5 showing cell-number-controlled test results of panel i. I.e. for a given statistical comparison, we sample the lowest number of cells of compared classes from the others, do the test, and repeat this sampling 1000 times, before combining the p-values using Fisher’s method. This cell-number controlled version of the tests resulted in clearly fewer significant differences across distributions - seen similarly for the pre-movement window shown in j in Author response image 1. We hope this clarified our aim to illustrate that low-dimensional embedding of cells’ trial-averaged activity can show how regional differences compare with laboratory differences.

      As a complementary statistical analysis to the shown KS tests, we fitted a linear-mixed-effects model (statsmodels.formula.api mixedlm), to the first and second PC for both activity windows (“Move”: [-0.5,1] first movement aligned; “Stim”: [-0.5,0.1] stimulus onset aligned), independently. Author response image 2 (in this rebuttal only) is broadly in line with the KS results, showing more regional than lab influences on the distributions of first PCs for the post-movement window.

      Author response image 2:

      Linear mixed effects model results for two PCs and two activity windows. For the post-movement window (“Move”), regional influences are significant (red color in plots) for all but one region while only one lab has a significant model coefficient for PC1. For PC2 more labs and three regions have significant coefficients. For the pre-movement window (“Stim”) one region for PC1 or PC2 has significant coefficients. The variance due to session id was smaller than all other effects (“eids Var”). “Intercept” shows the expected value of the response variable (PC1, PC2) before accounting for any fixed or random effects. All p-values were grouped as one hypothesis family and corrected for multiple comparisons via Benjamini-Hochberg.

      (7) In the discussion the authors state: " Indeed this approach is a more effective and streamlined way of doing it, but it is questionable whether it 'exceeds' what is done in many labs.

      Classically, scientists trace each probe manually with light microscopy and designate each area based on anatomical landmarks identified with nissl or dapi stains together with gross landmarks. When not automated with 2-PI serial tomography and anatomically aligned to a standard atlas, this is a less effective process, but it is not clear that it is less precise, especially in studies before neuropixels where active electrodes were located in a much smaller area. While more effective, transforming into a common atlas does make additional assumptions about warping the brain into the standard atlas - especially in cases where the brain has been damaged/lesioned. Readers can appreciate the effectiveness and streamlining provided by these new tools without the need to invalidate previous approaches.

      We thank the reviewer for highlighting the effectiveness of manual tracing methods used traditionally. Our intention in the statement was not to invalidate the precision or value of these classical methods but rather to emphasize the scalability and streamlining offered by our pipeline. We have revised the language to more accurately reflect this (line 500-504):

      “Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, an approach that greatly exceeds the histological analyses done by most individual labs. Our approach, which enables scalability and standardization across labs while minimizing subjective variability, revealed that much of the variance in targeting was due to the probe entry positions at the brain surface, which were randomly displaced across the dataset.”

      (8) What about across-lab population-level representation of task variables, such as in the coding direction for stimulus or choice? Is the general decodability of task variables from the population comparable across labs?

      Excellent question, thanks! We have added the new section “Decodability of task variables is consistent across labs, but varies by brain region” (line 423-448) and Figure 9 in the revised manuscript to address this question. In short, yes, the general decodability of task variables from the population is comparable across labs, providing additional reassurance of reproducibility.

      Reviewer #2 (Public review):

      Summary:

      The authors sought to evaluate whether observations made in separate individual laboratories are reproducible when they use standardized procedures and quality control measures. This is a key question for the field. If ten systems neuroscience labs try very hard to do the exact same experiment and analyses, do they get the same core results? If the answer is no, this is very bad news for everyone else! Fortunately, they were able to reproduce most of their experimental findings across all labs. Despite attempting to target the same brain areas in each recording, variability in electrode targeting was a source of some differences between datasets.

      Major Comments:

      The paper had two principal goals:

      (1) to assess reproducibility between labs on a carefully coordinated experiment

      (2) distill the knowledge learned into a set of standards that can be applied across the field.

      The manuscript made progress towards both of these goals but leaves room for improvement.

      (1) The first goal of the study was to perform exactly the same experiment and analyses across 10 different labs and see if you got the same results. The rationale for doing this was to test how reproducible large-scale rodent systems neuroscience experiments really are. In this, the study did a great job showing that when a consortium of labs went to great lengths to do everything the same, even decoding algorithms could not discern laboratory identity was not clearly from looking at the raw data. However, the amount of coordination between the labs was so great that these findings are hard to generalize to the situation where similar (or conflicting!) results are generated by two labs working independently.

      Importantly, the study found that electrode placement (and thus likely also errors inherent to the electrode placement reconstruction pipeline) was a key source of variability between datasets. To remedy this, they implemented a very sophisticated electrode reconstruction pipeline (involving two-photon tomography and multiple blinded data validators) in just one lab-and all brains were sliced and reconstructed in this one location. This is a fantastic approach for ensuring similar results within the IBL collaboration, but makes it unclear how much variance would have been observed if each lab had attempted to reconstruct their probe trajectories themselves using a mix of histology techniques from conventional brain slicing, to light sheet microscopy, to MRI imaging.

      This approach also raises a few questions. The use of standard procedures, pipelines, etc. is a great goal, but most labs are trying to do something unique with their setup. Bigger picture, shouldn't highly "significant" biological findings akin to the discovery of place cells or grid cells, be so clear and robust that they can be identified with different recording modalities and analysis pipelines?

      We agree, and hope that this work may help readers understand what effect sizes may be considered “clear and robust” from datasets like these. We certainly support the reviewer’s point that multiple approaches and modalities can help to confirm any biological findings, but we would contend that a clear understanding of the capabilities and limitations of each approach is valuable, and we hope that our paper helps to achieve this.

      Related to this, how many labs outside of the IBL collaboration have implemented the IBL pipeline for their own purposes? In what aspects do these other labs find it challenging to reproduce the approaches presented in the paper? If labs were supposed to perform this same experiment, but without coordinating directly, how much more variance between labs would have been seen? Obviously investigating these topics is beyond the scope of this paper. The current manuscript is well-written and clear as is, and I think it is a valuable contribution to the field. However, some additional discussion of these issues would be helpful.

      We thank the reviewer for raising this important issue. We know of at least 13 labs that have implemented the behavioral task software and hardware that we published in eLife in 2021, and we expect that over the next several years labs will also implement these analysis pipelines (note that it is considerably cheaper and faster to implement software pipelines than hardware). In particular, a major goal of the staff in the coming years is to continue and improve the support for pipeline deployment and use. However, our goal in this work, which we have aimed to state more clearly in the revised manuscript, was not so much to advocate that others adopt our pipeline, but instead to use our standardized approach as a means of assessing reproducibility under the best of circumstances (see lines 48-52): “A high level of reproducibility of results across laboratories when procedures are carefully matched is a prerequisite to reproducibility in the more common scenario in which two investigators approach the same high-level question with slightly different experimental protocols.”

      Further, a number of our findings are relevant to other labs regardless of whether they implement our exact pipeline, a modified version of our pipeline, or something else entirely. For example, we found probe targeting to be a large source of variability. Our ability to detect targeting error benefited from an automated histological pipeline combined with alignment and tracing that required agreement between multiple users, but now that we have identified the offset, it can be easily accounted for by any lab. Specifically, probe angles must be carefully computed from the CCF, as the CCF and stereotaxic coordinate systems do not define the same coronal plane angle. Relatedly, we found that slight deviations in probe entry position can lead to samples from different populations of neurons. Although this took large cohort sizes to discover, knowledge of this discovery means that future experiments can plan for larger cohort sizes to allow for off-target trajectories, and can re-compute probe angle when the presence of blood vessels necessitates moving probes slightly. These points are now highlighted in the Discussion (lines 500-515).

      Second, the proportion of responsive neurons (a quantity often used to determine that a particular area subserves a particular function), sometimes failed to reproduce across labs. For example, for movement-driven activity in PO, UCLA reported an average change of 0 spikes/s, while CCU reported a large and consistent change (Figure 4d, right most panel, compare orange vs. yellow traces). This argues that neuron-to-neuron variability means that comparisons across labs require large cohort sizes. A small number of outlier neurons in a session can heavily bias responses. We anticipate that this problem will be remedied as tools for large scale neural recordings become more widely used. Indeed, the use of 4-shank instead of single-shank Neuropixels (as we used here) would have greatly enhanced the number of PO neurons we measured in each session. We have added new text to Results explaining this (lines 264-268):

      “We anticipate that the feasibility of even larger scale recordings will make lab-to-lab comparisons easier in future experiments; multi-shank probes could be especially beneficial for cortical recordings, which tend to be the most vulnerable to low cell counts since the cortex is thin and is the most superficial structure in the brain and thus the most vulnerable to damage. Analyses that characterize responses to multiple parameters are another possible solution (See Figure 7).”

      (2) The second goal of the study was to present a set of data curation standards (RIGOR) that could be applied widely across the field. This is a great idea, but its implementation needs to be improved if adoption outside of the IBL is to be expected. Here are three issues:

      (a) The GitHub repo for this project (https://github.com/int-brain-lab/paper-reproducible-ephys/) is nicely documented if the reader's goal is to reproduce the figures in the manuscript. Consequently, the code for producing the RIGOR statistics seems mostly designed for re-computing statistics on the existing IBL-formatted datasets. There doesn't appear to be any clear documentation about how to run it on arbitrary outputs from a spike sorter (i.e. the inputs to Phy).

      We agree that clear documentation is key for others to adopt our standards. To address this, we have added a section at the end of the README of the repository that links to a jupyter notebook (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb) that runs the RIGOR metrics on a user’s own spike sorted dataset. The notebook also contains a tutorial that walks through how to visually assess the quality of the raw and spike sorted data, and computes the noise level metrics on the raw data as well as the single cell metrics on the spike sorted data.

      (b) Other sets of spike sorting metrics that are more easily computed for labs that are not using the IBL pipeline already exist (e.g. "quality_metrics" from the Allen Institute ecephys pipeline [https://github.com/AllenInstitute/ecephys_spike_sorting/blob/main/ecephys_spike_sorting/m odules/quality_metrics/README.md] and the similar module in the Spike Interface package [https://spikeinterface.readthedocs.io/en/latest/modules/qualitymetrics.html]). The manuscript does not compare these approaches to those proposed here, but some of the same statistics already exist (amplitude cutoff, median spike amplitude, refractory period violation).

      There is a long history of researchers providing analysis algorithms and code for spike sorting quality metrics, and we agree that the Allen Institute’s ecephys code and the Spike Interface package are the current options most widely used (but see also, for example, Fabre et al. https://github.com/Julie-Fabre/bombcell). Our primary goal in the present work is not to advocate for a particular implementation of any quality metrics (or any spike sorting algorithm, for that matter), but instead to assess reproducibility of results, given one specific choice of spike sorting algorithm and quality metrics. That is why, in our comparison of yield across datasets (Fig 1F), we downloaded the raw data from those comparison datasets and re-ran them under our single fixed pipeline, to establish a fair standard of comparison. A full comparison of the analyses presented here under different choices of quality metrics and spike sorting algorithms would undoubtedly be interesting and useful for the field - however, we consider it to be beyond the scope of the present work. It is therefore an important assumption of our work that the result would not differ materially under a different choice of sorting algorithm and quality metrics. We have added text to the Discussion to clarify this limitation:

      “Another significant limitation of the analysis presented here is that we have not been able to assess the extent to which other choices of quality metrics and inclusion criteria might have led to greater or lesser reproducibility.”

      That said, we still intend for external users to be able to easily run our pipelines and quality metrics.

      (c) Some of the RIGOR criteria are qualitative and must be visually assessed manually. Conceptually, these features make sense to include as metrics to examine, but would ideally be applied in a standardized way across the field. The manuscript doesn't appear to contain a detailed protocol for how to assess these features. A procedure for how to apply these criteria for curating non-IBL data (or for implementing an automated classifier) would be helpful.

      We agree. To address this, we have provided a notebook that runs the RIGOR metrics on a user’s own dataset, and contains a tutorial on how to interpret the resulting plots and metrics (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/master/RIGOR_script.ipynb).

      Within this notebook there is a section focused on visually assessing the quality of both the raw data and the spike sorted data. The code in this section can be used to generate plots, such as raw data snippets or the raster map of the spiking activity, which are typically used to visually assess the quality of the data. In Figure 1 Supplement 2 we have provided examples of such plots that show different types of artifactual activity that should be inspected.

      Other Comments:

      (1) How did the authors select the metrics they would use to evaluate reproducibility? Was this selection made before doing the study?

      Our metrics were selected on the basis of our experience and expertise with extracellular electrophysiology. For example: some of us previously published on epileptiform activity and its characteristics in some mice (Steinmetz et al. 2017), so we included detection of that type of artifact here; and, some of us previously published detailed investigations of instability in extracellular electrophysiological recordings and methods for correcting them (Steinmetz et al. 2021, Windolf et al. 2024), so we included assessment of that property here. These metrics therefore represent our best expert knowledge about the kinds of quality issues that can affect this type of dataset, but it is certainly possible that future investigators will discover and characterize other quality issues.

      The selection of metrics was primarily performed before the study (we used these assessments internally before embarking on the extensive quantifications reported here), and in cases where we refined them further during the course of preparing this work, it was done without reference to statistical results on reproducibility but instead on the basis of manual inspection of data quality and metric performance.

      (2) Was reproducibility within-lab dependent on experimenter identity?

      We thank the reviewer for this question. We have addressed it in our response to R1 General comment 2, as follows:

      We agree that understanding experimenter-to-experimenter variability would be very interesting and indeed we had hoped to do this analysis for some time. The problem is that typically, each lab employed one trainee to conduct all the data collection. This prevents us from comparing outcomes from two different experimenters in the same lab. There are exceptions to this, such as the Churchland lab in which 3 personnel (two postdocs and a technician) collected the data. However, even this fortuitous situation did not lend itself well to assessing experimenter-to-experimenter variation: the Churchland lab moved from Cold Spring Harbor to UCLA during the data collection period, which might have caused variability that is totally independent of experimenter (e.g., different animal facilities). Further, once at UCLA, the postdoc and technician worked closely together- alternating roles in animal training, surgery and electrophysiology. We believe that the text in our current Discussion (line 465-468) accurately characterizes the situation:

      “Our experimental design precludes an analysis of whether the reproducibility we observed was driven by person-to-person standardization or lab-to-lab standardization. Most likely, both factors contributed: all lab personnel received standardized instructions for how to implant head bars and train animals, which likely reduced personnel-driven differences.”

      Quantifying the level of experience of each experimenter is an appealing idea and we share the reviewer’s curiosity about its impact on data quality. Unfortunately, quantifying experience is tricky. For instance, years of conducting surgeries is not an unambiguously determinable number. Would we count an experimenter who did surgery every day for a year as having the same experience as an experimenter who did surgery once/month for a year? Would we count a surgeon with expertise in other areas (e.g., windows for imaging) in the same way as surgeons with expertise in ephys-specific surgeries? Because of the ambiguities, we leave this analysis to be the subject of future work; this is now stated in the Discussion (line 476).

      (3) They note that UCLA and UW datasets tended to miss deeper brain region targets (lines 185-188) - they do not speculate why these labs show systematic differences. Were they not following standardized procedures?

      Thank you for raising this point. All researchers across labs were indeed following standardised procedures. We note that our statistical analysis of probe targeting coordinates and angles did not reveal a significant effect of lab identity on targeting error, even though we noted the large number of mis-targeted recordings in UCLA and UW to help draw attention to the appropriate feature in the figure. Given that these differences were not statistically significant, we can see how it was misleading to call out these two labs specifically. While the overall probe placement surface error and angle error both show no such systematic difference, the magnitude of surface error showed a non-significant tendency to be higher for samples in UCLA & UW, which, compounded with the direction of probe angle error, caused these probe insertions to land in a final location outside LP & PO.

      This shows how subtle differences in probe placement & angle accuracy can lead to compounded inaccuracies at the probe tip, especially when targeting deep brain regions, even when following standard procedures. We believe this is driven partly by the accuracy limit or resolution of the stereotaxic system, along with slight deviations in probe angle, occurring during the setup of the stereotaxic coordinate system during these recordings.

      We have updated the relevant text in lines 187-190 as follows, to clarify:

      “Several trajectories missed their targets in deeper brain regions (LP, PO), as indicated by gray blocks, despite the lack of significant lab-dependent effects in targeting as reported above. These off-target trajectories tended to have both a large displacement from the target insertion coordinates and a probe angle that unfavorably drew the insertions away from thalamic nuclei (Figure 2f).”

      (4) The authors suggest that geometrical variance (difference between planned and final identified probe position acquired from reconstructed histology) in probe placement at the brain surface is driven by inaccuracies in defining the stereotaxic coordinate system, including discrepancies between skull landmarks and the underlying brain structures. In this case, the use of skull landmarks (e.g. bregma) to determine locations of brain structures might be unreliable and provide an error of ~360 microns. While it is known that there is indeed variance in the position between skull landmarks and brain areas in different animals, the quantification of this error is a useful value for the field.

      We thank the reviewer for their thoughtful comment and are glad that they found the quantification of variance useful for the field.

      (5) Why are the thalamic recording results particularly hard to reproduce? Does the anatomy of the thalamus simply make it more sensitive to small errors in probe positioning relative to the other recorded areas?

      We thank the reviewer for raising this interesting question. We believe that they are referring to Figure 4: indeed when we analyzed the distribution of firing rate modulations, we saw some failures of reproducibility in area PO (bottom panel, Figure 4h). However, the thalamic nuclei were not, in other analyses, more vulnerable to failures in reproducibility. For example, in the top panel of Figure 4h, VisAM shows failures of reproducibility for modulation by the visual stimulus. In Fig. 5i, area CA1 showed a failure of reproducibility. We fear that the figure legend title in the previous version (which referred to the thalamus specifically) was misleading, and we have revised this. The new title is, “Neural activity is modulated during decision-making in five neural structures and is variable between laboratories.” This new text more accurately reflects that there were a number of small, idiosyncratic failures of reproducibility, but that these were not restricted to a specific structure. The new analysis requested by R1 (now in Figure 7) provides further reassurance of overall reproducibility, including in the thalamus (see Fig. 7a, right panels; lab identity could not be decoded from single neuron metrics, even in the thalamus).

      Reviewer #1 (Recommendations for the authors):

      (1) Figure font sizes and formatting are variable across panels and figures. Please streamline the presentation of results.

      Thank you for your feedback. We have remade all figures with the same standardized font sizes and formatting.

      (2) Please correct the noncontinuous color scales in Figures 3b and 3d.

      Thank you for pointing this out, we fixed the color bar.

      (3) In Figures 5d and g, the error bars are described as: 'Error bands are standard deviation across cells normalised by the square root of the number of sessions in the region'. How does one interpret this error? It seems to be related to the standard error of the mean (std/sqrt(n)) but instead of using the n from which the standard deviation is calculated (in this case across cells), the authors use the number of sessions as n. If they took the standard deviation across sessions this would be the sem across sessions, and interpretable (as sem*1.96 is the 95% parametric confidence interval of the mean). Please justify why these error bands are used here and how they can be interpreted - it also seems like it is the only time these types of error bands are used.

      We agree and for clarity use standard error across cells now, as the error bars do not change dramatically either way.

      (4) It is difficult to understand what is plotted in Figures 5e,h, please unpack this further and clarify.

      Thank you for pointing this out. We have added additional explanation in the figure caption (See caption for Figure 5c) to explain the KS test.

      (5) In lines 198-201 the authors state that they were worried that Bonferroni correction with 5 criteria would be too lenient, and therefore used 0.01 as alpha. I am unsure whether the authors mean that they are correcting for multiple comparisons across features or areas. Either way, 0.01 alpha is exactly what a Bonferroni corrected alpha would be when correcting for either 5 features or 5 areas: 0.05/5=0.01. Or do they mean they apply the Bonferroni correction to the new 0.01 alpha: i.e., 0.01/5=0.002? Please clarify.

      Thank you, that was indeed written confusingly. We considered all tests and regions as whole, so 7 tests * 5 regions = 35 tests, which would result in a very strong Bonferroni correction. Indeed, if one considers the different tests individually, the correction we apply from 0.05 to 0.01 can be considered as correcting for the number of regions, which we now highlight better. We apply no further corrections of any kind to our alpha=0.01. We clarified this in the manuscript in all relevant places (lines 205-208, 246, 297-298, and 726-727).

      (6) Did the authors take into account how many times a probe was used/how clean the probe was before each recording. Was this streamlined between labs? This can have an effect on yield and quality of recording.

      We appreciate the reviewer highlighting the potential impact of probe use and cleanliness on recording quality and yield. While we did not track the number of times each probe was used, we ensured that all probes were cleaned thoroughly after each use using a standardized cleaning protocol (Section 16: Cleaning the electrode after data acquisition in Appendix 2: IBL protocol for electrophysiology recording using Neuropixels probe). We acknowledge that tracking the specific usage history of each probe could provide additional insights, but unfortunately we did not track this information for this project. In prior work the re-usability of probes has been quantified, showing insignificant degradation with use (e.g. Extended Data Fig 7d from Jun et al. 2017).

      (7) Figure 3, Supplement1: DY_013 missed DG entirely? Was this included in the analysis?

      Thank you for this question. We believe the reviewer is referring to the lack of a prominent high-amplitude LFP band in this mouse, and lack of high-quality sorted units in that region. Despite this, our histology did localize the recording trajectory to DG. This recording did pass our quality control criteria overall, as indicated by the green label, and was used in relevant analyses.

      The lack of normal LFP features and neuron yield might reflect the range of biological variability (several other sessions also have relatively weak DG LFP and yield, though DY_013 is the weakest), or could reflect some damage to the tissue, for example as caused by local bleeding. Because we could not conclusively identify the source of this observation, we did not exclude it.

      (8) Given that the authors argue for using the MTNN over GLMs, it would be useful to know exactly how much better the MTNN is at predicting activity in the held-out dataset (shown in Figure 7, Supplement 1). It looks like a very small increase in prediction performance between MTNN and GLMs, is it significantly different?

      The average variance explained on the held-out dataset, as shown in Figure 8–Figure Supplement 1 Panel B, is 0.065 for the GLMs and 0.071 for the MTNN. As the reviewer correctly noted, this difference is not significant. However, one of the key advantages of the MTNN over GLMs lies in its flexibility to easily incorporate covariates, such as electrophysiological characteristics or session/lab IDs, directly into the analysis. This feature is particularly valuable for assessing effect sizes and understanding the contributions of various factors.

      (9) In line 723: why is the threshold for mean firing rate for a unit to be included in the MTNN results so high (>5Hz), and how does it perform on units with lower firing rates?      

      We thank the reviewer for pointing this out. The threshold for including units with a mean firing rate above 5 Hz was set because most units with firing rates below this threshold were silent in many trials, and reducing the number of units helped keep the MTNN training time reasonable. Based on this comment, we ran the MTNN experiments including all units with firing rates above 1 Hz, and the results remained consistent with our previous conclusions (Figure 8). Crucially, the leave-one-out analysis consistently showed that lab and session IDs had effect sizes close to zero, indicating that both within-lab and between-lab random effects are small and comparable.

      Reviewer #2 (Recommendations for the authors):

      (1) Most of the more major issues were already listed in the above comments. The strongest recommendation for additional work would be to improve the description and implementation of the RIGOR statistics such that non-IBL labs that might use Neuropixels probes but not use the entire IBL pipeline might be able to apply the RIGOR framework to their own data.

      We thank the reviewer for highlighting the importance of making the RIGOR statistics more accessible to a broader audience. We agree that improving the description and implementation of the RIGOR framework is essential for facilitation of non-IBL labs using Neuropixels probes. To address this we created a jupyter notebook with step-by-step guidance that is not dependent on the IBL pipeline. This tool (https://github.com/int-brain-lab/paper-reproducible-ephys/blob/develop/RIGOR_script.ipynb) is publicly available through the repository, accompanied by example datasets and usage tutorials.

      (2) Table 1: How are qualitative features like "drift" defined? Some quantitative statistics like "presence ratio" (the fraction of the dataset where spikes are present) already exist in packages like ecephys_spike_sorting. Who measured these qualitative features? What are the best practices for doing these qualitative analyses?

      At the probe level, we compute the estimate of the relative motion of the electrodes to the brain tissue at multiple depths along the electrode. We overlay the drift estimation over a raster plot to detect sharp displacements as a function of time. Quantitatively, the drift is the cumulative absolute electrode motion estimated during spike sorting (µm). We clarified the corresponding text in Table 1.

      The qualitative assessments were carried out by IBL staff and experimentalists. We have now provided code to run the RIGOR metrics along with an embedded tutorial, to complement the supplemental figures we have shown about qualitative metric interpretation.

      (3) Table 1: What are the units for the LFP derivative?

      We thank the reviewer for noting that the unit was missing. The unit (decibel per unit of space) is now in the table.

      (4) Table 1: For "amplitude cutoff", the table says that "each neuron must pass a metric". What is the metric?

      We have revised the table to include this information. This metric was designed to detect potential issues in amplitude distributions caused by thresholding during deconvolution, which could result in missed spikes. There are quantitative thresholds on the distribution of the low tail of the amplitude histogram relative to the high tail, and on the relative magnitude of the bins in the low tail. We now reference the methods text from the table, which includes a more extended description and gives the specific threshold numbers. Also, the metric and thresholds are more easily understood with graphical assistance; see the IBL Spike Sorting Whitepaper for this (Fig. 17 in that document and nearby text; https://doi.org/10.6084/m9.figshare.19705522.v4). This reference is now also cited in the text.

      (5) Figure 2: In panel A, the brain images look corrupted.

      Thanks; in the revised version we have changed the filetype to improve the quality of the panel image.

      (6) Figure 7: In panel D, make R2 into R^2 (with a superscript)

      Panel D y-axis label has been revised to include superscript (note that this figure is now Figure 8).

      Works Cited

      Julie M.J. Fabre, Enny H. van Beest, Andrew J. Peters, Matteo Carandini, and Kenneth D. Harris. Bombcell: automated curation and cell classification of spike-sorted electrophysiology data, July 2023. URL https://doi.org/10.5281/zenodo.8172822.

      James J. Jun, Nicholas A. Steinmetz, Joshua H. Siegle, Daniel J. Denman, Marius Bauza, Brian Barbarits, Albert K. Lee, Costas A. Anastassiou, Alexandru Andrei, C¸ a˘gatayAydın, Mladen Barbic, Timothy J. Blanche, Vincent Bonin, Jo˜ao Couto, Barundeb Dutta, Sergey L. Gratiy, Diego A. Gutnisky, Michael H¨ausser, Bill Karsh, Peter Ledochowitsch, Carolina Mora Lopez, Catalin Mitelut, Silke Musa, Michael Okun, Marius Pachitariu, Jan Putzeys, P. Dylan Rich, Cyrille Rossant, Wei-lung Sun, Karel Svoboda, Matteo Carandini, Kenneth D. Harris, Christof Koch, John O’Keefe, and Timothy D.Harris. Fully integrated silicon probes for high-density recording of neural activity.Nature, 551(7679):232–236, Nov 2017. ISSN 1476-4687. doi: 10.1038/nature24636. URL https://doi.org/10.1038/nature24636.

      Simon Musall, Xiaonan R. Sun, Hemanth Mohan, Xu An, Steven Gluf, Shu-Jing Li, Rhonda Drewes, Emma Cravo, Irene Lenzi, Chaoqun Yin, Bj¨orn M. Kampa, and Anne K. Churchland. Pyramidal cell types drive functionally distinct cortical activity patterns during decision-making. Nature Neuroscience, 26(3):495– 505, Mar 2023. ISSN 1546-1726. doi: 10.1038/s41593-022-01245-9. URL https://doi.org/10.1038/s41593-022-01245-9.

      Ivana Orsolic, Maxime Rio, Thomas D Mrsic-Flogel, and Petr Znamenskiy. Mesoscale cortical dynamics reflect the interaction of sensory evidence and temporal expectation during perceptual decision-making. Neuron, 109(11):1861–1875.e10, April 2021. Hyeong-Dong Park, St´ephanie Correia, Antoine Ducorps, and Catherine Tallon-Baudry.Spontaneous fluctuations in neural responses to heartbeats predict visual detection.Nature Neuroscience, 17(4):612–618, Apr 2014. ISSN 1546-1726. doi: 10.1038/nn.3671. URL https://doi.org/10.1038/nn.3671.

      Lorenzo Posani, Shuqi Wang, Samuel Muscinelli, Liam Paninski, and Stefano Fusi. Rarely categorical, always high-dimensional: how the neural code changes along the cortical hierarchy. bioRxiv, 2024. doi: 10.1101/2024.11.15.623878. URL https://www.biorxiv.org/content/early/2024/12/09/2024.11.15.623878.

      Nicholas A. Steinmetz, Christina Buetfering, Jerome Lecoq, Christian R. Lee, Andrew J. Peters, Elina A. K. Jacobs, Philip Coen, Douglas R. Ollerenshaw, Matthew T. Valley, Saskia E. J. de Vries, Marina Garrett, Jun Zhuang, Peter A. Groblewski, Sahar Manavi, Jesse Miles, Casey White, Eric Lee, Fiona Griffin, Joshua D. Larkin, Kate Roll, Sissy Cross, Thuyanh V. Nguyen, Rachael Larsen, Julie Pendergraft, Tanya Daigle, Bosiljka Tasic, Carol L. Thompson, Jack Waters, Shawn Olsen, David J. Margolis, Hongkui Zeng, Michael Hausser, Matteo Carandini, and Kenneth D. Harris. Aberrant cortical activity in multiple gcamp6-expressing transgenic mouse lines. eNeuro, 4(5), 2017. doi: 10.1523/ENEURO.0207-17.2017. URL https://www.eneuro.org/content/4/5/ENEURO.0207-17.2017.

      Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributed coding of choice, action and engagement across the mouse brain. Nature, 576(7786):266–273, Dec 2019. ISSN 1476-4687. doi: 10.1038/s41586-019-1787-x. URL https://doi.org/10.1038/s41586-019-1787-x.

      Nicholas A. Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia B¨ohm, Martijn Broux, Susu Chen, Jennifer Colonell, Richard J. Gardner, Bill Karsh, Fabian Kloosterman, Dimitar Kostadinov, Carolina Mora-Lopez, John O’Callaghan, Junchol Park, Jan Putzeys, Britton Sauerbrei, Rik J. J. van Daal, Abraham Z. Vollan, Shiwei Wang, Marleen Welkenhuysen, Zhiwen Ye, Joshua T. Dudman, Barundeb Dutta, Adam W. Hantman,Kenneth D. Harris, Albert K. Lee, Edvard I. Moser, John O’Keefe, Alfonso Renart, Karel Svoboda, Michael H¨ausser, Sebastian Haesler, Matteo Carandini, and Timothy D. Harris. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science, 372(6539):eabf4588, 2021. doi: 10.1126/science.abf4588.URL https://www.science.org/doi/abs/10.1126/science.abf4588.

      Charlie Windolf, Han Yu, Angelique C. Paulk, Domokos Mesz´ena, William Mu˜noz, Julien Boussard, Richard Hardstone, Irene Caprara, Mohsen Jamali, Yoav Kfir, Duo Xu, Jason E. Chung, Kristin K. Sellers, Zhiwen Ye, Jordan Shaker, Anna Lebedeva, Manu Raghavan, Eric Trautmann, Max Melin, Jo˜ao Couto, Samuel Garcia, Brian Coughlin, Csaba Horv´ath, Rich´ard Fi´ath, Istv´an Ulbert, J. Anthony Movshon, Michael N. Shadlen, Mark M. Churchland, Anne K. Churchland, Nicholas A. Steinmetz, Edward F. Chang, Jeffrey S. Schweitzer, Ziv M. Williams, Sydney S. Cash, Liam Paninski, and Erdem Varol. Dredge: robust motion correction for high-density extracellular recordings across species. bioRxiv, 2023. doi: 10.1101/2023.10.24.563768. URL https://www.biorxiv.org/content/early/2023/10/29/2023.10.24.563768.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study presents a novel pipeline for the large-scale genomic prediction of members of the non-ribosomal peptide group of pyoverdines based on a dataset from nearly 2000 Pseudomonas genomes. The advance presented in this study is largely based on solid evidence, although some main claims are only incompletely supported. This study on bacterial siderophores has broad theoretical and practical implications beyond a singular subfield.

      Thank you for the supportive and encouraging words. We appreciate the editor’s and reviewers’ careful and professional assessment of this manuscript. The reviewers’ scrutiny has helped us to improve the presentation and discussion of our work. We have now carefully revised the manuscript following their instructive suggestions and comments. Please find below our detailed responses (marked in blue) to each of the comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript introduces a bioinformatic pipeline designed to enhance the structure prediction of pyoverdines, revealing an extensive and previously overlooked diversity in siderophores and receptors. Utilizing a combination of feature sequence and phylogenetic approaches, the method aims to address the challenging task of predicting structures based on dispersed gene clusters, particularly relevant for pyoverdines.

      Predicting structures based on gene clusters is still challenging, especially pyoverdines as the gene clusters are often spread to different locations in the genome. An improved method would indeed be highly useful, and the diversity of pyoverdine gene clusters and receptors identified is impressive.

      However, so far the method basically aligns the structural genes and domains involved in pyoverdine biosynthesis and then predicts A domain specificity to predict the encoded compounds. Both methods are not particularly new as they are included in other tools such as PRISM (10.1093/nar/gkx320) or Sandpuma (https://doi.org/10.1093/bioinformatics/btx400) among others. The study claims superiority in A domain prediction compared to existing tools, yet the support is currently limited, relying on a comparison solely with AntiSMASH. A more extensive and systematic comparison with other tools is needed.  

      Thanks for pointing this out. In the revised manuscript, we have included a comprehensive comparative analysis, in which we compared our pipeline to six different commonly used methods, including NP.searcher, PRISM4, AdenPredictor, SeMPI2, SANDPUMA, antiSMASH5 (see Supplementary_table 6 for details, and lines 281-286). These approaches either consist of a single specific algorithm or integrate several methods. Our approach performs best (see table below), demonstrating a clear improvement over previous tool. The improvements are due to several methodological differences inherent to our approach. Additionally, while exploring existing prediction tools, we found that some had not been maintained for years. For instance, we were unable to access NRPSsp (www.nrpssp.com) and NRPSpredictor2 (http://nrps.informatik.uni-tuebingen.de/). Below, we briefly explain these differences, particularly in relation to PRISM and SANDPUMA, as highlighted by the reviewer. 

      Author response table 1.

      PRISM annotates biosynthetic gene clusters (BGC) and reconstructs the linear structures of NRPS synthetases, with this function depending on proper annotations of open reading frames. This pipeline can have difficulties in assembling the linear structure into a final product. In our approach, we found that the annotations of NRPS gene are frequently truncated because of sequencing errors and annotation issues. Our method fixes this problem through rescanning all possible reading frames of the BGC to rebuild complete pyoverdine synthetase genes. 

      Sandpum and our approach are based on similar ideas (using the prediCAT algorithm) to predict A domain substrates, namely by using the closest reference A domain annotated. However, our method uses a self-adaptive feature extraction step to reduce the co-founding influence of phylogeny. This small adjustment significantly improves the performance of our approach and even works well for small training sets (101 experimentally validated A domains with our approach as opposed to 494 A domains used by Sandpuma from MIBiG).

      Additionally, in contradiction to the authors' claims, the method's applicability seems constrained to well-known and widely distributed gene clusters. The absence of predictions for new amino acids raises concerns about its generalizability to NRPS beyond the studied cases.

      We thank the reviewers for this comment. We acknowledge that our method cannot directly predict new amino acids. Nevertheless, for several reasons we believe that our approach is not constrained and can be widely applied in the future.

      First, our method can identify A domains that select new unknown amino acid substrates. In fact, three of the four unresolved cases in our experimental verification analysis (Fig. 3d) represent new amino acids. Obviously, experimental verification is required to characterize the unknown substrate. Once verified, the new A domains and their substrates can expand the reference dataset, allowing targeted improvement of our phylogeny-focused prediction technique. We now discuss this aspect in lines 634-645.

      Second, despite that the overall substrate diversity in NRPS is high across the microbial kingdom, our analysis suggests that the number of amino acids used for a specific group of secondary metabolites quickly reaches a saturation point. The discovery rate of new amino acids was 1.7% for our experimental Pseudomonas data set (Fig. 3d). The discovery rate of new amino acids was even 0.0 % for the Burkholderiales data set. This suggests that as the database expands, the discovery rate of novel amino acid substrates is expected to drop rapidly.

      Third, we acknowledge that the inability to predict the substrates of unknown domains is a common limitation among all knowledge-guided learning algorithms, including ours. However, we have made significant improvements in prediction accuracy. As the database grows, we expect the rate of unknown substrates to decrease, and the prediction accuracy to increase.

      The manuscript lacks clarity on how the alignment of structural genes operates when dealing with multiple NRPS gene clusters on different genome contigs. How would the alignment of each BGC work?

      We thank the reviewers for this comment. The pyoverdine molecules consist of a conserved fluorescent chromophore (Flu) and a peptide chain (Pep), both synthesized by NRPS enzymes. In most instances (over 90%), Flu and Pep are produced by two separate biosynthetic gene clusters (BGCs). In these cases, we merge the two BGCs by positioning Flu at the head and Pep at the tail. For the remaining less than 10%, there are two scenarios: 1. Flu and Pep are located on the same BGC, which eliminates any issues with BGC alignment. 2. In very rare cases, Flu and Pep are synthesized by three BGCs. Here, Flu is still synthesized by one BGC at the head, while Pep is produced by two BGCs. We put the BGC containing the Thioesterase (TE) domain as the tail and the BGC not containing the TE domain in the middle.

      (see lines 165-169).

      Another critical concern is that a main challenge in NRPS structure prediction is not the backbone prediction but rather the prediction of tailoring reactions, which is not addressed in the manuscript at all, and this limitation extensively restricts the applicability of the method.

      While we thank the reviewer for this comment, we only partly agree with it. Peptide backbone predictions are still a significant challenge. This challenge is clearly visible in our new analysis comparing prediction accuracies of different pipelines, such as antiSMASH5, PRISM4, AdenPredictor, SeMPI2, NP.searcher, Sandpuma. Unresolved and wrong substrate predictions are still common, highlighting the importance of our contribution in developing a new approach with improved high accuracy. 

      However, we agree with the reviewer that our current algorithm does not predict tailoring reactions (now discussed on lines 680-685). Although tailoring reactions are important for predicting the final NRPS product structure, none of the other existing pipelines address this issue either, and it remains a challenge for future work. For our study, it is important to note that the specificity of pyoverdines is primarily determined by the backbone composition, whereas tailoring reactions seem to play a minor role.

      The manuscript presents a potentially highly useful bioinformatic pipeline for pyoverdine structure prediction, showcasing a commendable exploration of siderophore diversity. However, some of the claims made remain unsubstantiated. Overall, while the study holds promise, further validation and refinement are required to fulfill its potential impact on the field of bioinformatic structure prediction.

      Thank you for the supportive and encouraging words. We deeply appreciate your constructive comments and suggestions. 

      Reviewer #2 (Public Review):

      Pyoverdines, siderophores produced by many Pseudomonads, are one of the most diverse groups of specialized metabolites and are frequently used as model systems. Thousands of Pseudomonas genomes are available, but large-scale analyses of pyoverdines are hampered by the biosynthetic gene clusters (BGCs) being spread across multiple genomic loci and existing tools' inability to accurately predict amino acid substrates of the biosynthetic adenylation (A) domains. The authors present a bioinformatics pipeline that identifies pyoverdine BGCs and predicts the A domain substrates with high accuracy. They tackled a second challenging problem by developing an algorithm to differentiate between outer membrane receptor selectivity for pyoverdines versus other siderophores and substrates. The authors applied their dataset to thousands of Pseudomonas strains, producing the first comprehensive overview of pyoverdines and their receptors and predicting many new structural variants.

      The A domain substrate prediction is impressive, including the correction of entries in the MIBiG database. Their high accuracy came from a relatively small training dataset of A domains from 13 pyoverdine BGCs. The authors acknowledge that this small dataset does not include all substrates, and correctly point out that new sequence/structure pairs can be added to the training set to refine the prediction algorithm. 

      The authors could have been more comprehensive in finding their training set data. For instance, the authors claim that histidine "had not been previously documented in pyoverdines", but the sequenced strain P. entomophila L48, incorporates His (10.1007/s10534-009-9247-y). 

      Thank you for highlighting this issue. We agree that stating histidine has not been reported before in pyoverdine was incorrect. We have reviewed the full text and made the necessary corrections.

      The primary reason for excluding the sequenced strains P. syringae 1448a (10.1186/14712180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) from the training set is that the pyoverdine structures of these strains were not determined solely through experimental methods. In these works, the pyoverdine structures were predicted based on the synthetic gene sequence using bioinformatical analysis, followed by structural analysis experiments based on this predicted structure. We found that pre-prediction probably has introduced biases into downstream analyses. Specifically, in the case of Pseudomonas entomophila L48, we discovered inaccuracies in the annotation of certain domains (see figures below). For example, the third A domain of the peptide chain in P. entomophila L48 pyoverdine was initially annotated with Dab specificity. However, upon closer examination, it appears to differ significantly from other Dab references (top) or Dab from our experimentally validated (right) domains (left panel in the figure below). By analyzing the interface (I) domain (10.1073/pnas.1903161116) in its predicted site, we suggested that it should actually recognize OHHis. The OHAsp domain of P. entomophila L48 reported in the paper is actually close in sequence similarity to the OHAsp domain (left panel in the figure below), while the Ala domain reported is more similar to the Ser domain (right panel in the figure below). For these reasons, we did not include this supervised pyoverdine structure analysis strain in the training set data.

      Author response image 1.

      The workflow cannot differentiate between different variants of Asp and OHOrn, and it's not clear if this is a limitation of the workflow, the training data, or both. 

      Thanks for pointing this out. It is generally challenging to differentiate between variants of the same amino acid (for all the algorithms existing to date). In this sense, it is a limitation of our but also of all other workflows. Nonetheless, we wish to stress that we observed feature sequence divergence (using the A motif4-5 region), which helped us to separate some (but not all) of the Asp and Orn variants. For example, separations between Asp-variants are distinct (left panel in the figure below). To be on the conservative side, we only differentiated between OHAsp and Asp for our predictions, but also differentiation between DOHAsp and OHAsp would be possible. In the case of Orn-variants, there was a clear separation between Orn and the OHOrn variants (right panel). In contrast, it was difficult to differentiate between the subgroups of OHOrn variants. We believe that no A domain prediction tool will be able to solve this issue. Instead, it would be important to include information on substrate-modifying enzymes in future approaches.

      Author response image 2.

      The prediction workflow holds up well in Burkholderiales A domains, however, they fail to mention in the main text that they achieved these numbers by adding more A domains to their training set.

      We thank the reviewers for this comment. We apologize for not having mentioned the training data set in the main text, while we described it in detail in the methods section (lines 714-732). We now provided more details on the analysis procedure in the main text (lines 307313). Important to note is that we did not add more A domains to the training data set but built up a new independent data set for Burkholderiales. The aim was to mirror the analysis we performed for pyoverdines with a completely new data set, featuring 124 A domains for training and 178 A domains as test set.

      To validate their predictions, they elucidated structures of several new pyoverdines, and their predictions performed well. However, the authors did not include their MS/MS data, making it impossible to validate their structures. In general, the biggest limitation of the submitted manuscript is the near-empty methods section, which does not include any experimental details for the 20 strains or details of the annotation pipeline (such as "Phydist" and "Syndist"). The source code also does not contain the requisite information to replicate the results or re-use the pipeline, such as the antiSMASH version and required flags. That said, skimming through the source code and data (kindly provided upon request) suggests that the workflow itself is sound and a clear improvement over existing tools for pyoverdine BGC annotation.

      Thank you for highlighting these issues. We agree that the methods section is short. This is because the entire paper is a step-by-step methodological introduction to our pipeline. We have now carefully revised the main text to add the information requested by the reviewer. Moreover, we have included a supplementary file with the MS/MS data of the experimentally analyzed pyoverdine structures. Finally, we further include a link to a one-click online notebook that can be used to replicate the annotation and substrate prediction results See: https://drive.google.com/drive/folders/1JsfyPUGDTFo8BDDZk8JLSvKry8emzMhr?usp=drive_ link , following a more detail explanation on code.

      Predicting outer membrane receptor specificity is likewise a challenging problem and the authors have made a promising achievement by finding specific gene regions that differentiate the pyoverdine receptor FpvA from FpvB and other receptor families. Their predictions were not tested experimentally, but the finding that only predicted FpvA receptors were proximate to the biosynthesis genes lends credence to the predictive power of the workflow. The authors find predicted pyoverdine receptors across an impressive 468 genera, an exciting finding for expanding the role of pyoverdines as public goods beyond Pseudomonas. However, whether or not these receptors can recognize pyoverdines (and if so, which structures!) remains to be investigated.

      Thank you for the supportive and encouraging words. The bioinformatic analysis and experimental testing of pyoverdine-receptor matching is complicated and it is not part of this paper. We treated it in a separate manuscript in which we developed an experimentally verified co-evolution algorithm that matches pyoverdines to receptors. With this algorithm, we can identify self-receptors (i.e. receptors used to take up the self-produced pyoverdine), and therefore establish pyoverdine sharing and interaction networks across strains in communities.

      Please see DOI:10.1101/2023.11.05.565711 for details.

      In all, the authors have assembled a rich dataset that will enable large-scale comparative genomic analyses. This dataset could be used by a variety of researchers, including those studying natural product evolution, public good eco/evo dynamics, and NRPS engineering.

      Thank you for the supportive and encouraging words. We are grateful for the reviewers’ instructive suggestions and comments.

      Reviewer #3 (Public Review):

      Summary:

      Secondary metabolites are produced by numerous microorganisms and have important ecological functions. A major problem is that neither the function of a secondary metabolite enzyme nor the resulting metabolite can be precisely predicted from gene sequence data.

      In the current paper, the authors addressed this highly relevant question.

      The authors developed a bioinformatic pipeline to reconstruct the complete secondary metabolism pathway of pyoverdines, a class of iron-scavenging siderophores produced by Pseudomonas spp. These secondary metabolites are biosynthesized by a series of nonribosomal peptide synthetases and require a specific receptor (FpvA) for uptake. The authors combined knowledge-guided learning with phylogeny-based methods to predict with high accuracy encoding NRPSs, substrate specificity of A domains, pyoverdine derivatives, and receptors. After validation, the authors tested their pipeline with sequence data from 1664 phylogenetically distinct Pseudomonas strains and were able to determine 18,292 enzymatic A domains involved in pyoverdine synthesis, reliably predicted 97.8% of their substrates, identified 188 different pyoverdine molecule structures and 4547 FpvA receptor variants belonging to 94 distinct groups. All the results and predictions were clearly superior to predictions that are based on antiSMASH. Novel pyoverdine structures were elucidated experimentally by UHPLC-HR-MS/MS.

      To assess the extendibility of the pipeline, the authors chose Burkholderiales as a test case which led to the results that the pipeline consistently maintains high prediction accuracy within Burkholderiales of 83% which was higher than for antiSMASH (67%).

      Together, the authors concluded that supervised learning based on a few known compounds produced by species from the same genus probably outperforms generalized prediction algorithms trained on many products from a diverse set of microbes for NRPS substrate predictions. As a result, they also show that both pyoverdine and receptor diversity have been vastly underestimated.

      Strengths:

      The authors developed a very useful bioinformatic pipeline with high accuracy for secondary metabolites, at least for pyoverdines. The pipelines have several advantages compared to existing pipelines like the extensively used antiSMASH program, e.g. it can be applied to draft genomes, shows reduced erroneous gene predictions, etc. The accuracy was impressively demonstrated by the discovery of novel pyoverdines whose structures were experimentally substantiated by UHPLC-HR-MS/MS.

      The manuscript is very well written, and the data and the description of the generation of pipelines are easy to follow.

      Weaknesses:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains.

      Thanks for your positive and encouraging comment. Regarding your only major comment, we think that the design concept of our pipeline has the potential to be applied to more complex non-ribosomal peptides. Currently, our method is tailored to accurately predict the structural composition of the Pseudomonas siderophore pyoverdine (see also response 3). A key point emphasized in our article is the importance of considering phylogeny in developing substrate prediction algorithms for A domains. Currently, the main challenge in advancing these algorithms is the limited availability of data on A domains and their corresponding substrates. However, with the future accumulation of more reference data, we are confident that the design principles of our method will enable precise predictions of the structural compositions of all products synthesized by non-ribosomal peptide synthetases (see our discussions in lines 634-

      645). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I believe that the manuscript would benefit from focusing solely on the task of improving pyoverdine predictions. This aspect alone is significant, and robustly supporting this claim would strengthen the manuscript. The diversity analysis provided is valuable and would undoubtedly benefit the scientific community. However, additional systematic comparisons with other methods are necessary. Furthermore, clarification of certain terms, such as 'featurebased' (e.g., whether it refers to NRPS domains or CDS), would enhance clarity.

      Thank you for the supportive and encouraging words. We followed the reviewer’s suggestion and now provide the requested method comparison, see also response 2 for details. Furthermore, we have carefully checked the main text to clarify terms whenever needed. Specifically, we now define the terms “feature sequence” and “feature sequence distance” in lines 227-229.  

      Additionally, several minor points could be improved upon:

      In line 85, clarification is needed on how pyoverdine genes were identified.

      Thank you for your thorough review. In the introduction section, we provided a brief overview of our work, while the detailed methodology is outlined in the results section on lines 160-174.

      In line 382, it would be helpful to know the source of the sequences.

      We agree and have now carefully revised the manuscript following your suggestions (lines 403-405).

      Line 392 could be explained more clearly. Does it mean that the authors used an hmm search to search pHMMs against each reference sequence?

      Thanks for your comment. Yes, we used an hmm search to search pHMMs against each reference sequence. We have now revised the manuscript to improve explanations (lines 413-418).

      Reviewer #2 (Recommendations For The Authors):

      The authors state they "elucidated the chemical structure of the 20 pyoverdines using culturebased methods combined with UHPLC-HR-MS/MS", so I was alarmed to see that KR and LB already published several of those structures in the cited paper. I hope that this "double dipping" will be fixed in a revision process.

      Thank you for pointing this out. We agree that we have not explained clearly enough what steps were conducted in this study and which data were used from a previous paper (https://doi.org/10.1007/s00216-022-03907-w). The genomes of the 20 strains used for the verification analysis (Fig. 3d) were sequenced as part of this study (access code now provided). 14 out of the 20 pyoverdine structures were elucidated with UHPLC-HR-MS/MS in this study. For 6 out of the 20 pyoverdines, we had structural information already at hand from the previous paper. We have now clarified these details in our manuscript (lines 276-280). 

      Thank you for providing the source code and data, and I hope that the final non-redundant dataset will be uploaded to Zenodo or another repository. Please deposit the 20 newlysequenced genomes to GenBank or another public repository. Please also show the UHPLC-

      HR-MS/MS data, preferably in the form of raw data uploaded to GNPS.

      We have followed the reviewer’s advice and deposited our data:

      - The sequences of the 20 newly sequenced strains are available on ENA accession PRJEB76792.

      - The MS/MS plots of the 14 newly analyzed pyoverdines are shown in the Supplementary Materials.

      - We provide a one-click online notebook to allow readers to replicate the pyoverdine cluster annotation and substrate prediction of the 20 experimentally analyzed strains.

      I suggest adding "at least" or a similar qualifier when the 73 variants are mentioned unless the literature search was truly exhaustive. What were the criteria for inclusion of the 13 strains in Table S2? For instance, sequenced strains P. syringae 1448a (10.1186/1471-2180-11-218) and P. entomophila L48 (10.1007/s10534-009-9247-y) were not included.

      Thank you for your comment. We have now carefully revised the manuscript following your suggestions (lines 291-295). Regarding the criteria for including the 13 strains in Table S2, we aimed to select strains with the high credibility for inclusion in the training set data. The primary reason for excluding the two strains from the training set is that their siderophore structures were analyzed through supervised experiments. We wanted to avoid any form of biases that bioinformatic pre-predictions could introduce to downstream analyses (see Response 13 for details).

      OHAsp in pyoverdines has been reported to arise from hydroxylation of Asp after it's already been activated by the A domain (10.1073/pnas.1903161116). Was there a clear difference between A domains that lead to Asp and OHAsp? Conversely, acetylation and formylation of OHOrn occur before adenylation. Can your workflow be used to differentiate cOHOrn, fOHOrn, and AcOHOrn, which are currently difficult to predict through genome mining?

      Thank you for these considerations. We treated these aspects in our response 8.  

      Throughout, define non-proteinogenic AA substrate abbreviations (ex: Rsc, Dab).

      Revised as per suggestion (lines 329-333).

      Additional line comments:

      189: Mention PhyloPhlAn in the main text.

      Revised as per suggestion (lines 189).

      191: Define these filtering/selection criteria.

      Thanks for your comment, we have added the criteria in the main text (line 196 and line 198). 

      309, 620: An A domain presumably loading histidine is present in sequenced strain P. entomophila L48 (10.1007/s10534-009-9247-y). Please also clarify that Val has previously been seen in a pyoverdine (it is in Table S1) albeit not sequenced.

      We have clarified these aspects as per suggestion (lines 314-315 and line 630).

      310: The pipeline can "highlight" new substrates, but not identify them.

      Revised as per suggestion (line 295).

      354: Please clarify "13 amino acid substrates form the core of all the 188 pyoverdine structures", considering that 279 A domain substrates couldn't be predicted.

      Thanks for your comments. We have now clarified “our analysis found that 13 amino acids form the main structural substrates of all the 188 pyoverdine structures.” (lines

      360-363)

      630: "discovered" implies that there is experimental evidence. I suggest something like "here we predicted 151 putatively new variants".

      Revised as per suggestion (line 648).

      Reviewer #3 (Recommendations For The Authors):

      Weakness:

      The only major comment I have is the uncertainty of whether the pipeline can be applied to more complex non-ribosomal peptides. In the current study, the authors only applied their pipeline to a very narrow field, i.e., pyoverdines of Pseudomonas and Burkholderia strains

      Thanks for your comment. Please see our Responses 3+13 above, where we treat this concern in detail. Moreover, we discussed the possibility of extension to other groups of secondary metabolites in our discussion. We believe that we deliver a balanced view on the applicability of our approach and the next steps to be taken.  

      Please comment on this aspect.

      Minor:

      (1)  When you speak about "synthesis" it is rather biosynthesis. Synthesis is chemical synthesis.

      Please replace all instances of the word synthesis with biosynthesis.

      Revised as per suggestion.

      (2)  Line 188: synthetase is rather synthetases

      Revised as per suggestion (line 191).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

      Our explanations/speculations regarding major comments 2 and 3 were included in the Discussion. We apologize for this misunderstanding as we thought that we were supposed to explain our ideas only in the responses. We did not discuss the comment 4, however, as we are really not sure what is the true effect and did not want to go into wild speculations in our manuscript. We thank this reviewer for his insightful comments and understanding.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The authors report the potential translational regulation of Raf kinase by re-initiation. It would be interesting to show that Raf is indeed regulated by uORF-mediated translation, and that this is dependent on an intact eIF3 complex. Analyzing the potential consequences of Raf1 regulation for cancer cell proliferation or apoptosis would be a plus.

      We agree that this is an interesting and likely possibility. In fact, another clue that translation of Raf1 is regulated by uORFs comes from Bohlen et al. 2023 (PMID: 36869665) where they showed that RAF1 translation is dependent on PRRC2 proteins (that promote leaky scanning through these uORFs). We noted in the discussion that our results from eIF3d/e/hKD and the PRRC2A/B/CKD partly overlap. It is a subject of our follow-up research to investigate whether eIF3 and PRRC2 co-operate together to regulate translation of this important mRNA. 

      (2) The authors show that eIF3 d/e -but not 3h- has an effect on cell proliferation. First, this indicates that proliferation does not fully correlate with eIF3 integrity. Depletion of eIF3d does not affect the integrity of eIF3, yet the effects on proliferation are similar to those of eIF3e. What is the possibility that changes in proliferation reflect functions of eIF3d outside the eIF3 complex? What could be the real consequences of disturbing eIF3 integrity for the mammalian cell? Please, discuss.

      Yes, proliferation does not fully correlate with eIF3 integrity. Downregulation of eIF3 subunits that lead to disintegration of eIF3 YLC core (a, b, c, g, i) have more detrimental effect on growth and translation than downregulation of the peripheral subunits (e, k, l, f, h, m). Our previous studies (Wagner et al. 2016, PMID: 27924037 and Herrmannová et al. 2020, PMID: 31863585) indicate that the YLC core of eIF3 can partially support translation even without its peripheral subunits. In this respect eIF3d (as a peripheral subunit) is an amazing exception, suggesting it may have some specialized function(s). Whether this function resides outside of the eIF3 complex or not we do not know, but do not think so. Mainly because in the absence of eIF3e – its interaction partner, eIF3d gets rapidly degraded. Therefore, it is not very likely that eIF3d exists alone outside of eIF3 complex with moonlighting functions elsewhere. We think that eIF3d, as a head-interacting subunit close to an important head ribosomal protein RACK1 (a landing pad for regulatory proteins), is a target of signaling pathways, which may make it important for translation of specific mRNAs. In support is these thoughts, eIF3d (in the context of entire eIF3) together with DAP5 were shown to promote translation by an alternate capdependent (eIF4F-independent) mechanism (Lee et al. 2016, PMID: 27462815; de la Parra et al. 2018, PMID:30076308). In addition, the eIF3d function (also in the context of entire eIF3) was proved to be regulated by stress-triggered phosphorylation (Lamper et al. 2020, PMID: 33184215). 

      (3) Figure 6D: Surprisingly, reduced levels of ERK1/2 upon eIF3d/e-KD are compensated by increased phosphorylation of ERK1/2 and net activation of c-Jun. Please comment on the functional consequences of buffering mechanisms that the cell deploys in order to counteract compromised eIF3 function. Why would the cell activate precisely the MAPK pathway to compensate for a compromised eIF3 function?

      This we do not know. We can only speculate that when translation is compromised, cells try to counteract it in two ways: 1) they produce more ribosomes to increase translational rates and 2) activate MAPK signaling to send pro-growth signals, which can in the end further boost ribosome biogenesis.

      (4) Regarding DAP-sensitive transcripts, can the authors discuss in more detail the role of eIF3d in alternative cap-dependent translation versus re-initiation? Are these transcripts being translated by a canonical cap- and uORF-dependent mechanism or by an alternative capdependent mechanism?

      This is indeed not an easy question. On one hand, it was shown that DAP5 facilitates translation re-initiation after uORF translation in a canonical cap-dependent manner. This mechanism is essential for translation of the main coding sequence (CDS) in mRNAs with structured 5' leaders and multiple uORFs. (Weber et al. 2022, PMID: 36473845; David et al., 2022, PMID: 35961752). On the other hand, DAP5 was proposed to promote alternative, eIF4F-independent but cap-dependent translation, as it can substitute the function of the eIF4F complex in cooperation with eIF3d (de la Parra et al., 2018, PMID: 30076308; Volta et al., 2021 34848685). Overall, these observations paint a very complex picture for us to propose a clear scenario of what is going on between these two proteins on individual mRNAs. We speculate that both mechanisms are taking place and that the specific mechanism of translation initiation differs for differently arranged mRNAs.

      Minor comments:

      (5) Figure S2C: why is there a strong reduction of the stop codon peak for 3d and 3h KDs?

      We have checked the Ribowaltz profiles of all replicates (in the Supplementary data we are showing only a representative replicate I) and the stop codon peak differs a lot among the replicates. We think that this way of plotting was optimized for calculation and visualization of P-sites and triplet periodicity and thus is not suitable for this type of comparison among samples. Therefore, we have performed our own analysis where the 5’ ends of reads are used instead of P-sites and triplicates are averaged and normalized to CDS (see below please), so that all samples can be compared directly in one plot (same as Fig. S13A but for stop codon). We can see that the stop codon peak really differs and is the smallest for eIF3hKD. However, these changes are in the range of 20% and we are not sure about their biological significance. We therefore refrain from drawing any conclusions. In general, reduced stop codon peak may signal faster termination or increased stop codon readthrough, but the latter should be accompanied by an increased ribosome density in the 3’UTR, which is not the case. A defect in termination efficiency would be manifested by an increased stop codon peak, instead.

      Author response image 1.

       

      (6) Figures 5 and S8: Adding a vertical line at 'zero' in all cumulative plots will help the reader understand the author's interpretation of the data. 

      We have added a dashed grey vertical line at zero as requested. However, for interpretation of these plots, the reader should focus on the colored curve and whether it is shifted in respect to the grey curve (background) or not. Shift to the right indicates increased expression, while shift to the left indicates decreased expression. The reported p-value then indicates the statistical significance of the shift.

      (7) The entire Figure 2 are controls that can go to Supplementary Material. The clustering of Figure S3B could be shown in the main Figure, as it is a very easy read-out of the consistent effects of the KDs of the different eIF3 subunits under analysis.

      We have moved the entire Figure 2 to Supplementary Material as suggested (the original panels can be found as Supplementary Figures 1B, 1C and 3A). Figure S3B is now the main Figure 2E. 

      (8) There are 3 replicates for Ribo-Seq and four for RNA-Seq. Were these not carried out in parallel, as it is usually done in Ribo-seq experiments? Why is there an extra replicate for RNASeq?

      Yes, the three replicates were carried out in parallel. We have decided to add the fourth replicate in RNA-Seq to increase the data robustness as the RNA-Seq is used for normalization of FP to calculate the TE, which was our main analyzed metrics in this article. We had the option to add the fourth replicate as we originally prepared five biological replicates for all samples, but after performing the control experiments, we selected only the 3 best replicates for the Ribo-Seq library preparation and sequencing.  

      (9) Please, add another sheet in Table S2 with the names of all genes that change only at the translation (RPF) levels.

      As requested, we have added three extra sheets (one for each downregulation) for differential FP with Padjusted <0.05 in the Spreadsheet S2. We also provide a complete unfiltered differential expression data (sheet named “all data”), so that readers can filter out any relevant data based on their interest.

      (10) Page 5, bottom: ' ...we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules...'. This is not true for eIF3d, as shown in Fig1B and mentioned in Results.

      This reviewer is correct. By this generalized statement, we were trying to summarize our previous results from Wagner et al., 2014, PMID: 24912683; Wagner et al.,2016, PMID: 27924037 and Herrmannova et al.,2020, PMID: 31863585. The eIF3d downregulation is the only exception that does not affect expression of any other eIF3 subunit. Therefore, we have rewritten this paragraph accordingly: “We recently reported a comprehensive in vivo analysis of the modular dynamics of the human eIF3 complex (Wagner et al, 2020; Wagner et al, 2014; Wagner et al., 2016). Using a systematic individual downregulation strategy, we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules leading to the formation of partial eIF3 subcomplexes with limited functionality (Herrmannova et al, 2020). eIF3d is the only exception in this respect, as its downregulation does not influence expression of any other eIF3 subunit.”

      (11) Page 10, bottom: ' The PCA plot and hierarchical clustering... These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d.' This is already obvious in the polysome profiles of Figure S2C.

      We agree that this result is surely not surprising given the polysome profile and growth phenotype analyses of eIF3hKD. But still, we think that the PCA plot and hierarchical clustering results represent valuable controls. Nonetheless, we rephrased this section to note that this result agrees with the polysome profiles analysis: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: Ribo-Seq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      (12) Page 12: ' As for the eIF3dKD "unique upregulated" DTEGs, we identified one interesting and unique KEGG pathway, the ABC transporters (Supplementary Figure 5A, in green).' This sentence is confusing, as there are more pathways that are significant in this group, so it is unclear why the authors consider it 'unique'.

      The eIF3dKD “unique upregulated” group comprises genes with increased TE only in eIF3dKD but not in eIF3eKD or eIF3hKD (500 genes, Fig 2G). All these 500 genes were examined for enrichment in the KEGG pathways, and the top 10 significant pathways were reported (Fig S6A). However, 8 out of these 10 pathways were also significantly enriched in other gene groups examined (e.g. eIF3d/eIF3e common). Therefore, the two remaining pathways (“ABC transporters” and “Other types of O-glycan biosynthesis”) are truly unique for eIF3dKD. We wanted to highlight the ABC transporters group in particular because we find it rather interesting (for the reasons mentioned in the article). We have corrected the sentence in question to avoid confusion: “Among the eIF3dKD “unique upregulated” DTEGs, we identified one interesting KEGG pathway, the ABC transporters, which did not show up in other gene groups (Supplementary Figure 6A, in green). A total of 12 different ABC transporters had elevated TE (9 of them are unique to eIF3dKD, while 3 were also found in eIF3eKD), 6 of which (ABCC1-5, ABCC10) belong to the C subfamily, known to confer multidrug resistance with alternative designation as multidrug resistance protein (MRP1-5, MRP7) (Sodani et al, 2012).

      Interestingly, all six of these ABCC transporters were upregulated solely at the translational level (Supplementary Spreadsheet S2).”    

      (13) Note typo ('Various') in Figure 4A.

      Corrected

      (14) The introduction could be shortened.

      This is a very subjective requirement. In fact, when this manuscript was reviewed in NAR, we were asked by two reviewers to expand it substantially. Because a number of various research topics come together in this work, e.g. translational regulation, the eIF3 structure and function, MAPK/ERK signaling, we are convinced that all of them demand a comprehensive introduction for non-experts in each of these topics. Therefore, with all due respect to this reviewer, we did not ultimately shorten it.

      Reviewer #2 (Recommendations For The Authors):

      - In Figure 2, it would be useful to know why eIF3d is destabilized by eIF3e knockdown - is it protein degradation and why do the eIF3d/e knockdowns not more completely phenocopy each other when there is the same reduction to eIF3d as in the eIF3d knockdown sample?

      Yes, we do think that protein degradation lies behind the eIF3d destabilization in the eIF3eKD, but we have not yet directly demonstrated this. However, we have shown that eIF3d mRNA levels are not altered in eIF3eKD and that Ribo-Seq data indicate no change in TE or FP for eIF3d-encoding mRNA in eIF3eKD. Nonetheless, it is important to note (and we discuss it in the article) that eIF3d levels in eIF3dKD are lower than eIF3d levels in eIF3eKD (please see Supplementary Figure 1C). In fact, we believe that this is one of the main reasons for the eIF3d/e knockdowns differences.

      - The western blots in Figures 4 and 6 show modest changes to target protein levels and would be strengthened by quantification.

      We have added the quantifications as requested by this reviewer and the reviewer 3.

      - For Figure 4, this figure would be strengthened by experiments showing if the increase in ribosomal protein levels is correlated with actual changes to ribosome biogenesis.

      As suggested, we performed polysome profiling in the presence of EDTA to monitor changes in the 60S/40S ratio, indicating a potential imbalance in the biogenesis of individual ribosome subunits. We found that it was not affected (Figure 3G). In addition, we performed the same experiment, normalizing all samples to the same number of cells (cells were carefully counted before lysis). In this way, we confirmed that eIF3dKD and eIF3eKD cells indeed contain a significantly increased number of ribosomes, in agreement with the western blot analysis (Figure 3H).

      - In Figure 6, there needs to be a nuclear loading control.

      This experiment was repeated with Lamin B1 used as a nuclear loading control – it is now shown as Fig. 5F.

      - For Figure 8, these findings would be strengthened using luciferase reporter assays where the various RNA determinants are experimentally tested. Similarly, 5′ TOP RNA reporters would have been appreciated in Figure 4.

      This is indeed a logical continuation of our work, which represents the current work in progress of one of the PhD students. We apologize, but we consider this time- and resource-demanding analysis out of scope of this article.

      Reviewer #3 (Recommendations For The Authors):

      (1) Within the many effects observed, it is mentioned that eIF3d is known to be overexpressed while eIF3e is underexpressed in many cancers, but knockdown of either subunit decreases MDM2 levels, which would be expected to increase P53 activity and decrease tumor cell transformation. In contrast, they also report that 3e/3d knockdown dramatically increases levels of cJUN, presumably due to increased MAPK activity, and is expected to increase protumor gene expression. Additional discussion is needed to clarify the significance of the findings, which are a bit confusing.

      This is indeed true. However, considering the complexity of eIF3, the largest initiation factor among all, as well as the broad portfolio of its functions, it is perhaps not so surprising that the observed effects are complex and may seem even contradictory in respect to cancer. To acknowledge that, we expanded the corresponding part of discussion as follows: “Here, we demonstrate that alterations in the eIF3 subunit stoichiometry and/or eIF3 subcomplexes have distinct effects on the translatome; for example, they affect factors that play a prominent (either positive or negative) role in cancer biology (e.g., MDM2 and cJUN), but the resulting impact is unclear so far. Considering the complex interactions between these factors as well as the complexity of the eIF3 complex per se, future studies are required to delineate the specific oncogenic and tumor suppressive pathways that play a predominant role in mediating the effects of perturbations in the eIF3 complex in the context of neoplasia.”

      (2) There are places in the text where the authors refer to changes in transcriptional control when RNA levels differ, but transcription versus RNA turnover wasn't tested, e.g. page 16 and Figure S10, qPCR does not confirm "transcriptional upregulation in all three knockdowns" and page 19 "despite apparent compensatory mechanisms that increase their transcription."

      This is indeed true, the sentences in question were corrected. The term “increased mRNA levels” was used instead of transcriptional upregulation (increased mRNA stabilization is also possible).

      (3) Similarly, the authors suggest that steady-state LARP1 protein levels are unaffected based on ribosome footprint counts (page 21). It is incorrect to assume this, because ribosome footprints can be elevated due to stalling on RNA that isn't being translated and doesn't yield more protein, and because levels of translated RNA/synthesized proteins do not always reflect steady-state protein levels, especially in mutants that could affect lysosome levels and protein turnover. Also page 12, 1st paragraph suggests protein production is down when ribosome footprints are changed.

      Yes, we are well-aware of this known limitation of Ribo-seq analysis. Therefore, the steadystate protein levels of our key hits were verified by western blotting. In addition, we have removed the sentence about LARP1 because it was based on Ribo-Seq data only without experimental evaluation of the steady-state LARP1 protein levels.

      (4) The translation buffering effect is not clear in some Figures, e.g. S6, S8, 8A, and B. The authors show a scheme for translationally buffered RNAs being clustered in the upper right and lower left quadrants in S4H (translation up with transcript level down and v.v.), but in the FP versus RNA plots, the non-TOP RNAs and 4E-P-regulated RNAs don't show this behavior, and appear to show a similar distribution to the global changes. Some of the right panels in these figures show modest shifts, but it's not clear how these were determined to be significant. More information is needed to clarify, or a different presentation, such as displaying the RNA subsets in the left panels with heat map coloring to reveal whether RNAs show the buffered translation pattern defined in purple in Figure S4H, or by reporting a statistical parameter or number of RNAs that show behavior out of total for significance. Currently the conclusion that these RNAs are translationally buffered seems subjective since there are clearly many RNAs that don't show changes, or show translation-only or RNA-only changes.

      We would like to clarify that S4H does not indicate a necessity for changes in FPs in the buffered subsets. Although opposing changes in total mRNA and FPs are classified as buffering, often we also consider the scenario where there are changes to the total mRNA levels not accompanied by changes in ribosome association.

      In figure S6, the scatterplots indicate a high density of genes shifted towards negative fold changes on the x-axis (total mRNA). This is also reflected in the empirical cumulative distribution functions (ecdfs) for the log2 fold changes in total mRNA in the far right panels of A and B, and the lack of changes in log2 fold change for FPs (middle panels). Similarly, in figure S8, the scatterplots indicate a density of genes shifted towards positive fold changes on the x-axis for total mRNA. The ecdfs also demonstrate that there is a significant directional shift in log2 fold changes in the total mRNA that is not present to a similar degree in the FPs, consistent with translational offsetting. It is rightly pointed out that not all genes in these sets follow the same pattern of regulation. We have revised the title of Supplementary Figure S6 (now S7) to reflect this. However, we would like to emphasize that these figures are not intended to communicate that all genes within these sets of interest are regulated in the same manner, but rather that when considered as a whole, the predominant effect seen is that of translational offsetting (directional shifts in the log2 fold change distribution of total mRNA that are not accompanied by similar shifts in FP mRNA log2 fold changes).

      The significance of these differences was determined by comparing the ecdfs of the log2 fold changes for the genes belonging to a particular set (e.g. non-TOP mTOR-sensitive, p-eIF4E-sensitive) against all other expressed genes (background) using a Wilcoxan rank sum test. This allows identification of significant shifts in the distributions that have a clear directionality (if there is an overall increase, or decrease in fold changes of FPs or total mRNA compared to background). If log2 fold changes are different from background, but without a clear directionality (equally likely to be increased or decreased), the test will not yield a significant result. This approach allows assessment of the overall behavior of gene signatures within a given dataset in a manner that is completely threshold-independent, such that it does not rely on classification of genes into different regulatory categories (translation only, buffering, etc.) based on significance or fold-change cut-offs (as in S4H). Therefore, we believe that this unbiased approach is well-suited for identifying cases when there are many genes that follow similar patterns of regulation within a given dataset.

      (5) Page 10-"These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d" ...These results suggest that eIF3h has less impact on the translatome, not that it does so differently. If it were changing translation by a different mechanism, I would not expect it to cluster with control.

      This sentence was rewritten as follows: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: RiboSeq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      Other minor issues:

      (1) There are some typos: Figure 2 leves, Figure 4 variou,

      Corrected.

      (2) Figure 3, font for genes on volcano plot too small

      Yes, maybe, however the resolution of this image is high enough to enlarge a certain part of it at will. In our opinion, a larger font would take up too much space, which would reduce the informativeness of this graph.

      (3) Figure S5, highlighting isn't defined.

      The figure legend for S5A (now S6A) states: “Less significant terms ranking 11 and below are in grey. Terms specifically discussed in the main text are highlighted in green.” Perhaps it was overlooked by this reviewer.

      (4) At several points the authors refer to "the MAPK signaling pathway", suggesting there is a single MAPK that is affected, e.g in the title, page 3, and other places when it seems they mean "MAPK signaling pathways" since several MAPK pathways appear to be affected.

      We apologize for any terminological inaccuracies. There are indeed several MAPK pathways operating in cells. In our study, we focused mainly on the MAPK/ERK pathway. The confusion probably stems from the fact that the corresponding term in the KEGG pathway database is labeled "MAPK signaling pathway" and this term, although singular, includes all MAPK pathways. We have carefully reviewed the entire article and have corrected the term used accordingly to either: 1) MAPK pathways in general, 2) the MAPK/ERK pathway for this particular pathway, or 3) "MAPK signaling pathway", where the KEGG term is meant.

      (5) Some eIF3 subunit RNAs have TOP motifs. One might expect 3e and 3h levels to change as a function of 3d knockdown due to TOP motifs but this is not observed. Can the authors speculate why the eIF3 subunit levels don't change but other TOP RNAs show TE changes? Is this true for other translation factors, or just for eIF3, or just for these subunits? Could the Western blot be out of linear range for the antibody or is there feedback affecting eIF3 levels differently than the other TOP RNAs, or a protein turnover mechanism to maintain eIF3 levels?

      This is indeed a very interesting question. In addition to the mRNAs encoding ribosomal proteins, we examined all TOP mRNAs and added an additional sheet to the S2 supplemental spreadsheet with all TOP RNAs listed in (Philippe et al., 2020, PMID: 32094190). According to our Ribo-Seq data, we could expect to see increased protein levels of eIF3a and eIF3f in eIF3dKD and eIF3eKD, but this is not the case, as judged from extensive western blot analysis performed in (Wagner et. al 2016, PMID: 27924037). Indeed, we cannot rule out the involvement of a compensatory mechanism monitoring and maintaining the levels of eIF3 subunits at steady-state – increasing or decreasing them if necessary, which could depend on the TOP motif-mediated regulation. However, we think that in our KDs, all non-targeted subunits that lose their direct binding partner in eIF3 due to siRNA treatment become rapidly degraded. For example, co-downregulation of subunits d, k and l in eIF3eKD is very likely caused by protein degradation as a result of a loss of their direct binding partner – eIF3e. Since we showed that the yeast eIF3 complex assembles co-translationally (Wagner et. al 2020, PMID: 32589964), and there is no reason to think that mammalian eIF3 differs in this regard, our working hypothesis is that free subunits that are not promptly incorporated into the eIF3 complex are rapidly degraded, and the presence or absence of the TOP motif in the 5’ UTR of their mRNAs has no effect. As for the other TOP mRNAs, translation factors eEF1B2, eEF1D, eEF1G, eEF2 have significantly increased FPs in both eIF3dKD and eIF3eKD, but we did not check their protein levels by western blotting to conclude anything specific.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major change:

      All three of our reviewers raised the possibility that changes in movement during the time spent at the center ports could have contributed to changes in SWR rates. Analyses to address this possibility, based on the examination of trials with high and low speeds, were originally included in the supplement but we did not sufficiently highlight and explain these results. To rectify this, we have moved these results into a new main Figure 3 and now include a paragraph describing our interpretation of these results (page 9). We also include a more detailed description of the subjects’ behavior during port times – namely, that all subjects must remain quite stationary while at the reward ports in order to keep their nose in a specific position which keeps the port triggered. As a result, all subjects maintain head speeds well below our typical speed threshold for immobility while at the ports. This leads us to predict that any feedback based on periods of immobility alone (as requested by Reviewer 3) would show results very similar to our Control cohort and would not alter SWR rates seen during neurofeedback trials.

      Minor changes:

      (1) Reviewer 1 observed our that reported statistics appeared to be missing an interaction term showing that neurofeedback differentially affected the SWR rate/count pre- and postreward. We apologize for a lack of clarity here: we fit pre- and post-reward times with separate linear mixed effects models, so this interaction term is neither expected nor defined in our model. We have added a sentence clarifying this aspect of our LME approach in the Methods section: “Each model is designed to compare samples from all trials of the control group to samples from neurofeedback and delay trials from the neurofeedback cohort for a specific time period (for instance, pre-reward-delivery at the center ports).” Combining both times in the same model would require adding an additional hierarchical level in order to preserve the pairing of the pre- and post-reward time period for each trial, which we are concerned would complicate the formulation and interpretation of the model. However, the reviewer raises a good point that the comparison between these two time periods reveals an additional difference between the trial types: SWR rate remains relatively consistent between the pre- and post-reward periods during neurofeedback trials, while delay and control trials show a clear increase in SWR rate between the two time periods. To visualize and quantify this effect, we calculated the difference in SWR rates between the two time periods and now include this plot as Supplementary Figure 2F, which is referenced in page 8 of the main text.

      (2) Reviewer 2 found our original title, “Neurofeedback training can modulate task-relevant memory replay in rats” to be misleading and suggestive of a manipulation to memory content. We are in complete agreement with the Reviewer in that our manipulation does not alter replay content, so to be more specific and accurate, we have changed our title to their suggestion “Neurofeedback training can modulate task-relevant memory replay rate in rats” accordingly.

      (3) Reviewer 2 also requested that we include analyses quantifying baseline SWR rates for each of our experimental subjects. Although we initially considered reporting our results in measures of change relative to each individual animal’s baseline, we decided against this approach for several reasons.

      First, it is important to clarify that we extensively train the animals on the task prior to implant, so we do not have access to a truly naïve, pre-behavior baseline SWR rate for any of our subjects. However, because the pre-implant training is conducted consistently between our neurofeedback and our control cohort, we have no reason to believe that the behavioral training prior to implant would introduce differences in SWR rate between the cohorts. Indeed, we find no difference in post-reward SWR rate (or SWR rate at the home well) when we quantify the first 250 trials of post-implant behavior for each subject (see panel A below). Note that we cannot compare the pre-reward SWR rate at this point, because it is influenced by the task structure which guarantees at least one SWR in each neurofeedback trial pre-reward.

      Further, we do find that SWR rate is quite consistent over many days of task performance in the control cohort (show for the post-reward period in panel B below). This suggests that comparing the post-neurofeedback training SWR rates for the neurofeedback cohort to SWR rates throughout the training for the control cohort is not likely to be confounded by differing amounts of training experience. This is supported by our analyses in Figure 2 which show no differences in SWR rate between the two cohorts when considering pre- and post-reward times combined.

      Author response image 1.

      (A) SWR rate calculated during the post-reward period at the center port for the first 250 trials of postimplant behavior for each animal. Trials of all types are included (ie both neurofeedback trials and delay trials for the manipulation cohort). Groupwise comparison p=0.192. (B) Mean SWR rate during the post-reward period at the center port for each behavioral training epoch shows no systematic change over time across subjects within the control cohort.

      Finally, within each cohort, we found the overall SWR rates to be quite consistent across animals. If each subject in the neurofeedback cohort had shown dramatically different SWR rates at the beginning of neurofeedback training, we would have needed to express the effect of neurofeedback training relative to baseline for each animal. However, since the range of SWR rates were highly comparable, we felt that it was more accessible, and easier to place our results within the context of the literature, by expressing our results as simple SWR rates themselves rather than measures of relative change. Within the neurofeedback cohort, comparing neurofeedback to delay trials is inherently matched for baseline SWR rate since these comparisons are made within the same animal.

      (4) Finally, Reviewer 2 raises the possibility that older animals or those with cognitive deficits might respond to neurofeedback differently. We entirely agree with this possibility, and note this in our Discussion section: “Since the neurofeedback paradigm depends on the occurrence of at least a low endogenous rate of SWR occurrence, it would be important to implement neurofeedback training as a relatively early interventional strategy prior to extensive neurodegeneration, and training may take longer in aged or impaired subjects.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      (1) In the "Introduction" section, an important aspect that requires attention pertains to the discussion surrounding the heterodimerization of CXCR4 and CCR5. Notably, the manuscript overlooks a recent study (https://doi.org/10.1038/s41467-023-42082-z) elucidating the mechanism underlying the formation of functional dimers within these G protein-coupled receptors (GPCRs)…The inclusion of this study within the manuscript would significantly enrich the contextual framework of the work, offering readers a comprehensive understanding of the current knowledge surrounding the structural dynamics and functional implications of CXCR4 and CCR5 heterodimerization.

      We thank the reviewer for his/her recommendation to enrich the contextual framework of our study. The Nature Communications paper by Di Marino et al. was published after we sent the first version of our manuscript to eLife, and therefore was not included in the discussion. As the reviewer rightly indicates, this paper elucidates the mechanism underlying the formation of functional dimers within CCR5 and CXCR4. Using metadynamics approaches, the authors emphasize the importance of distinct transmembrane regions for dimerization of the two receptors. In particular, CXCR4 shows two low energy dimer structures and the TMVI-TMVII helices are the preferred interfaces involved in the protomer interactions in both cases. Although the study uses in silico techniques, it also includes the molecular binding mechanism of CCR5 and CXCR4 in the membrane environment, as the authors generate a model in which the receptors are immersed in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) phospholipid bilayer with 10% cholesterol. This is an important point in this study, as membrane lipids also interact with membrane proteins, and the lipid composition affects CXCR4 oligomerization (Gardeta S.R. et al. Front. Immunol. 2023). In particular, Di Marino et al. find a cholesterol molecule placed in-between the two CXCR4 protomers where it engages a series of hydrophobic interactions with residues including Leu132, Val214, Leu216 and Phe249. Then, the polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes protomer binding. In our hands, the F249L mutation in CXCR4 reverted the antagonism of AGR1.137, suggesting that the compound binds, among others, this residue. We should, nonetheless, indicate that we analyzed receptor oligomerization and not CXCR4 dimerization, which was the main object of the Di Marino et al. study. It is therefore also plausible that other residues than those described as essential for CXCR4 dimerization might participate in receptor oligomerization. We can speculate that AGR1.137 might affect cholesterol binding to CXCR4 and, therefore, alter dimerization/oligomerization. Additionally, the CXCR4 x-ray structure with PDB code 3ODU (Wu B. et al. Science, 2010) experimentally shows the presence of two fatty acid molecules in contact with both TMV and TMVI. These molecules closely interact with hydrophobic residues in the protein, thereby stabilizing it in a hydrophobic environment. Although more experiments will be needed to clarify the mechanism involved, our results suggest that cholesterol and/or other lipids also play an important role in CXCR4 oligomerization and function, as seen for other GPCRs (Jakubik J. & ElFakahani E.E. Int J Mol Sci. 2021). However, we should also consider that other factors not included in the analysis by Di Marino et al. can also affect CXCR4 oligomerization; for instance, the co-expression of other chemokine receptors and/or other GPCRs that heterodimerize with CXCR4 might affect CXCR4 dynamics at the cell membrane, similar to other membrane proteins such as CD4, which also forms complexes with CXCR4 (Martinez-Muñoz L. et al. Mol. Cell 2018).

      The revised discussion contains references to the study by Di Marino et al. to enrich the contextual framework of our data.

      (2) In "various sections" of the manuscript, there appears to be confusion surrounding the terminology used to refer to antagonists. It is recommended to provide a clearer distinction between allosteric and orthosteric antagonists to enhance reader comprehension. An orthosteric antagonist typically binds to the same site as the endogenous ligand, directly blocking its interaction with the receptor. On the other hand, an allosteric antagonist binds to a site distinct from the orthosteric site, inducing a conformational change in the receptor that inhibits the binding of the endogenous ligand. By explicitly defining the terms "allosteric antagonist" and "orthosteric antagonist" within the manuscript, readers will be better equipped to discern the specific mechanisms discussed in the context of the study.

      The behavior of the compounds described in our manuscript (AGR1.35 and AGR1.137) fits with the definition of allosteric antagonists, as they bind on a site distinct from the orthosteric site, although they only block some ligand-mediated functions and not others. This would mean that they are not formally antagonists and should be not considered as allosteric compounds, as their binding on CXCR4 does not alter CXCL12 binding, although they might affect its affinity. In this sense, our compounds respond much better to the concept of negative allosteric modulators (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). They act by binding on a site distinct from the orthosteric site and selectively block some downstream signaling pathways but not others induced by the same endogenous agonist.

      To avoid confusion and to clarify the role of the compounds described in this study, we now refer to them as negative allosteric modulators along the manuscript.

      (3) In the Results section, the computational approach employed for "screening small compounds targeting CXCR4, particularly focusing on the inhibition of CXCL12-induced CXCR4 nanoclustering", requires clarification due to several points of incomprehension. The following recommendations aim to address these concerns and enhance the overall clarity of the section:

      (1) Computational Approach and Binding Mode Description: 

      -Explicitly describe the methodology for identifying the pocket/clef area in angstroms (Å) on the CXCR4 protein structure. Include details on how the volume of the cleft enclosed by TMV and TMVI was determined, as this information is not readily apparent in the provided reference (https://doi.org/10.1073/pnas.1601278113).

      The identification of the cleft was based on the observations by Wu et al. (Wu B. et al. Science 2010) who described the presence of bound lipids in the area formed by TMV and VI, and those of Wescott et al. (Wescott M.P. et al. Proc. Natl. Acad. Sci. 2016) on the importance of TMVI in the transmission of conformational changes promoted by CXCL12 on CXCR4 towards the cytoplasmic surface of the receptor to link the binding site with signaling activation. Collectively, these results, and our previous data on the critical role of the N-terminus region of TMVI for CXCR4 oligomerization (Martinez-Muñoz L. et al. Mol. Cell 2018), focused our in silico screening to this region. Once we detected that several compounds bound CXCR4 in this region, the cleavage properties were calculated by subtracting the compound structure. The resulting PDB was analyzed using the PDBsum server (Laskowski R.A. et. al. Protein Sci. 2018). Volume calculations were obtained using the server analyzing surface clefts by SURFNET (Laskowski R. A. J. Mol. Graph. 1995). The theoretical interaction surface between the selected compounds and CXCR4 and the atomic distances between the protein residues and the compounds was calculated using the PISA server (Krissinel E. & Henrick K. J. Mol. Biol. 2007) (Fig. I, only for review purposes). The analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1,381 Å3 that were not connected to the orthosteric site. In the case of AGR1.137, the data revealed two distinct clefts of 790 Å3 and 580 Å3 (Fig. I, only for review purposes). These details have been included in the revised manuscript (New Fig. 1A, Supplementary Fig 8A, B).

      (4) Clarify the statement regarding the cleft being "surface exposed for interactions with the plasma membrane," particularly in the context of its embedding within the membrane.

      For GPCRs, transmembrane domains represent binding sites for bioactive lipids that play important functional and physiological roles (Huwiler A. & Zangemeister-Wittke U. Pharmacol. Ther. 2018). The channel between TMV and TMVI connects the orthosteric chemokine binding pocket to the lipid bilayer and is occupied by an oleic acid molecule, according to the CXCR4 structure published in 2010 (Wu B. et al. Science 2010). In addition, the target region contains residues involved in cholesterol (and perhaps other lipids) engagement (Di Marino et al. Nat. Commun. 2023). Taken together, these data support our statement that the cleft supports interactions between CXCR4 molecules and the plasma membrane. 

      Moreover, the data of Di Marino et al. also support that CCR5 and CXCR4 have a symmetric and an asymmetric binding mode. Therefore, either dimeric structure has the possibility to form trimers, tetramers, and even oligomers by using the free binding interface to complex with another protomer. This hypothesis suggests that the interaction of dimers to form oligomers should involve residues distinct from those included in the dimeric conformation.

      The sentence has been modified in the revised manuscript to clarify comprehension.

      (5) Discuss the rationale behind targeting the allosteric binding pocket instead of the orthosteric pocket, outlining potential advantages and disadvantages.

      The advantages and disadvantages of using negative allosteric modulators vs orthosteric antagonists have been now included in the revised discussion. 

      The majority of GPCR-targeted drugs function by binding to the orthosteric site of the receptor, and are agonists, partial agonists, antagonists or inverse agonists. These orthosteric compounds can have off-target effects and poor selectivity due to highly homologous receptor orthosteric sites and to abrogation of spatial and/or temporal endogenous signaling patterns. 

      The alternative is to use allosteric modulators, which can tune the functions associated with the receptors without affecting the orthosteric site. They can be positive, negative or neutral modulators, depending on their effect on the functionality of the receptor (Foster D.J. & Conn P.J. Neuron 2017). For example, the use of a negative allosteric modulator of a chemokine receptor to dampen pathological signaling events, while retaining full signaling for non-pathological activities might limit adverse effects (Kohout T.A.et al. J. Biol. Chem. 2004). In this case, the negative allosteric modulator 873140 blocks CCL3 binding on CCR5 but does not alter CCL5 binding (Watson C. et al. Mol. Pharmacol. 2005). In other cases, allosteric modulators can stabilize a particular receptor conformation and block others. The mechanism of action of the anti-HIV-1, FDAapproved, CCR5 allosteric modulator, maraviroc (Jin J. et al. Sci. Signal. 2018) is attributed to its ability to modulate CCR5 dimer populations and their subsequent subcellular trafficking and localization to the cell membrane (Jin J .et al. Sci. Signal. 2018). Two CCR5 dimeric conformations that are imperative for membrane localization were present in the absence of maraviroc; however, an additional CCR5 dimer conformation was discovered after the addition of maraviroc, and all homodimeric conformations were further stabilized. This finding is consistent with the observation that CCR5 dimers and oligomers inhibit HIV host-cell entry, likely by preventing the HIV-1 co-receptor formation.

      It is well known that GPCRs activate G proteins, but they also recruit additional proteins (e.g., β-arrestins) that induce signaling cascades which, in turn, can direct specific subsets of cellular responses independent of G protein activation (Eichel K. et al. Nature 2018) and are responsible for either therapeutic or adverse effects. Allosteric modulators can thus be used to block these adverse effects without influencing the therapeutic benefits. This was the case in the design of G protein-biased agonists for the kappa opioid receptor, which maintain the desirable antinociceptive and antipruritic effects and eliminate the sedative and dissociative effects in rodent models (Brust T.F. et al. Sci. Signal 2016).

      (6) Provide the PDB ID of the CXCR4 structure used as a template for modeling with SwissModel. Explain the decision to model the structure from the amino acid sequence and suggest an alternative approach, such as utilizing AlphaFold structures and performing classical molecular dynamics with subsequent clustering for the best representative structure.

      The PDB used as a template for modeling CXCR4 was 3ODU. This information was already included in the material and methods section. At the time we performed these analyses, there were several crystallographic structures of CXCR4 in complex with different molecules and peptides deposited at the PDB. None of them included a full construct containing the complete receptor sequence to provide a suitable sample for Xray structure resolution, as the N- and C-terminal ends of CXCR4 are very flexible loops. In addition, the CXCR4 constructs contained T4 lysozyme inserted between helices TMV and TMVI to increase the stability of the protein––a common strategy used to facilitate crystallogenesis of GPCRs (Zou Y. et al. PLoS One 2012). Therefore, we generated a CXCR4 homology model using the SWISS-MODEL server (Waterhouse A. et al. Nucleic Acids Res. 2018). This program reconstructed the loop between TMV and TMVI, a domain particularly important in this study that was not present in any of the crystal structure available in PDB. The model structure was, nonetheless, still incomplete, as it began at P27 and ended at S319 because the terminal ends were not resolved in the crystal structure used as a template. Nevertheless, we considered that these terminal ends were not involved in CXCR4 oligomerization. 

      As Alphafold was not available at the time we initiated this project, we didn’t use it. However, we have now updated our workflow to current methods and predicted the structure of the target using AlphaFold (Jumper J. et al. Nature 2021) and the sequence available under UniProt entry P61073. We prepared the ligands using OpenBabel (O’Boyle N.M. et al., J. Cheminformatics 2011), with a gasteiger charge assignment, and generated 10 conformers for each input ligand using the OpenBabel genetic algorithm. We then prepared the target structure with Openmm, removing all waters and possible heteroatoms, and adding all missing atoms. We next predicted the target binding pockets with fPocket (Le Guilloux V. et al. BMC Bioinformatics 2009), p2rank (Krivak R. & Hoksza, J. Cheminformatics 2018), and AutoDock autosite (Ravindranath P.A. & Sanner M.F. Bioinformatics 2016). We chose only those pockets between TMV and TMVI (see answer to point 3). We merged the results of the three programs into so-called consensus pockets, as two pockets are said to be sufficiently similar if at least 75% of their surfaces are shared (del Hoyo D. et al. J. Chem. Inform. Model. 2023). From the consensus pockets, there was one pocket that was significantly larger than the others and was therefore selected. We then docked the ligand conformers in this pocket using AutoDock GPU (Santos-Martins D. et al. J. Chem. Theory Comput. 2021), LeDock (Liu N & Xu Z., IOP Conf. Ser. Earth Environ. Sci. 2019), and Vina (Eberhardt J. et al. J. Chem. Inf. Model. 2021). The number of dockings varied from 210 to 287 poses. We scored each pose with the Vina score using ODDT (Wójcikowski M. et al. J. Cheminform. 2015). Then, we clustered the different solutions into groups whose maximum RMSD was 1Å. This resulted in 40 clusters, the representative of each cluster was the one with maximum Vina score and confirmed that the selected compounds bound this pocket (Author response image 1). When required, we calculated the binding affinity using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013), in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. This information has now been included in the revised version of the manuscript.

      Author response image 1.

      AGR1.135 docking in CXCR4 using the updated protocol for ligand docking. Cartoon representation colored in gray with TMV and TMVI shown in blue and pink, respectively. AGR1.135 is shown in stick representation with carbons in yellow, oxygens in red and nitrogens in blue.

      (7) Specify the meaning of "minimal interaction energy" and where (if present) the interaction scores are reported in the text.

      We refer to minimal interaction energy, the best docking score, that is, the best score obtained in our docking studies. These data were not included in the previous manuscript due to space restrictions but are now included in the reviewed manuscript.

      (8) You performed docking studies using GLIDE to identify potential binding sites for the small compounds on the CXCR4 protein. The top-scoring binders were then subjected to further refinement using PELE simulations. However, I realize that a detailed description of the specific binding modes of these compounds was not provided in the text. Please make the description of binding poses more detailed

      Firstly, to assess the reliability of this method, a PELE study was carried out for the control molecule IT1t, which is a small drug-like isothiourea derivative that has been crystallized in complex with CXCR4 (PDB code: 3ODU). IT1t is a CXCR4 antagonist that binds to the CXCL12 binding cavity and inhibits HIV-1 infection (Das D. Antimicrob. Agents Chemother. 2015; Dekkers S. et al. J. Med. Chem. 2023). From the best five trajectories, two of them had clearly better binding energies, and corresponded to almost the same predicted pose of the molecule. Although the predicted binding mode was not exactly the same as the one in the crystal structure, the approximation was very good, giving validation to the approach. Although PELE is a suitable technique to find potential binding sites, the predicted poses must be subsequently refined using docking programs.

      Analyzing the best trajectories for the remaining ligands, at least one of the best-scored poses was always located at the orthosteric binding site of CXCR4. Even though these poses showed good binding energies, they were discarded as the in vitro biological experiments indicated that the compounds were unable to block CXCL12 binding or CXCL12-mediated inhibition of cAMP release or CXCR4 internalization. Collectively, these data indicated that the selected compounds did not behave as orthosteric inhibitors of CXCR4. The CXCL12 binding pocket is the biggest cavity in CXCR4, and so PELE may tend to place the molecules near it. However, all the compounds presented other feasible binding sites with a comparable binding energy.

      AGR1.135 and AGR1.137 showed interesting poses between TMV and TMVI with very good binding energy (-51.4 and -37.2 kcal/mol, respectively). This was precisely the region we had previously selected for the in silico screening, as previously described (see response to point 3).

      AGR1.131 showed two poses with low binding energy that were placed between helices TMI and TMVII (-43.6 kcal/mol) and between helices TMV and TMVI (-39.8 kcal/mol). This compound was unable to affect CXCL12-mediated chemotaxis and was therefore used as an internal negative control as it was selected in the in silico screening with the same criteria as the other compounds but failed to alter any CXCL12-mediated functions. PELE studies nonetheless provided different binding sites for each molecule, which had to be further studied using docking to obtain a more accurate binding mode. In agreement with the previous commentary, we repeated the analysis using AlphaFold and the rest of the procedure described (see our response to point 6) and calculated the binding energies for all the compounds using Schrodinger’s MM-GBSA procedure (Greenidge P.A. et al. J. Chem. Inf. Model. 2013). Calculations were performed in two ways: first, assuming that the ligand and target are fixed; second, with an energy minimization of all the atoms within a distance of 3Å from the ligand. The results using the first method indicated that AGR1.135 and AGR1.137 showed poses between TMV and TMVI with - 56.4 and -62.4 kcal/mol, respectively and AGR1.131 had a pose between TMI and TMVII with -61.6kcal/mol.  In the second method AGR1.135 and AGR1.137 showed poses between TMV and TMVI with -57.9, and -67.6 kcal/mol, respectively, and AGR1.131 of -62.2 kcal/mol between TMI and TMVII.

      This information is now included in the text.

      (9) (2) Experimental Design:-Justify the choice of treating Jurkat cells with a concentration of 50 μM of the selected compound. Consider exploring different concentrations and provide a rationale for the selected dosage. Additionally, clearly identify the type of small compound used in the initial experiment.

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the Jurkat migration experiments. In all cases, 100 µM nearly completely abrogated cell migration, but in order to reduce the amount of DMSO added to the cells we selected 50 µM for further experiments, as it was the concentration that inhibits 50-75% of ligand-induced cell migration. Regarding the type of small compounds used in the initial experiments, they were compounds included in the library described in reference #24 (Sebastian-Pérez V. et al Med. Biol. Chem. 2017), which contains heterocyclic compounds. We would note that we do not consider AGR1.137 a final compound. We think that there is scope to develop AGR1.137-based second-generation compounds with greater solubility in water, greater specificity or affinity for CXCR4, and to evaluate delivery methods to hopefully increase activity.  

      (10) Avoid reporting details in rounded parentheses within the text; consider relocating such information to the Materials and Methods section or figure captions for improved readability.

      Most of the rounded parentheses within the text have been eliminated in the revised version of the manuscript to improve readability.

      (11) Elaborate on the virtual screening approach using GLIDE software, specifying the targeted site and methodology employed.

      For the virtual screening, we used the Glide module (SP and XP function scoring) included in the Schrödinger software package, utilizing the corresponding 3D target structure and our MBC library (Sebastián-Pérez V et al. J. Chem. Inf. Model. 2017).  The center of the catalytic pocket was selected as the centroid of the grid. In the grid generation, a scaling factor of 1.0 in van der Waals radius scaling and a partial charge cutoff of 0.25 were used. A rescoring of the SP poses of each compound was then performed with the XP scoring function of the Glide. The XP mode in Glide was used in the virtual screening, the ligand sampling was flexible, epik state penalties were added and an energy window of 2.5 kcal/mol was used for ring sampling. In the energy minimization step, the distance-dependent dielectric constant was 4.0 with a maximum number of minimization steps of 100,000. In the clustering, poses were considered as duplicates and discarded if both RMS deviation is less than 0.5 Å and maximum atomic displacement is less than 1.3 Å.

      (12) Provide clarity on the statement that AGR1.131 "theoretically" binds the same motif, explaining the docking procedure used for this determination.

      In the in silico screening, AGR1.131 was one of the 40 selected compounds that showed, according to the PELE analysis (see answer to point 8), a pose with low binding energy (-39.8 kcal/mol) between TMV and TMVI helices, which is the selected area for the screening. It, nonetheless, also showed a best pose placed between helices TM1 and TM7 (-43.7 kcal/mol) using the initial workflow. In conclusion, although AGR1.131 also faced to the TMV-TMVI, the most favorable pose was in the area between TMI and TMVII. In addition, the compound was included in the biological screening, where it did not affect CXCL12-mediated chemotaxis. We thus decided to use it as an internal negative control, as it has a skeleton very similar to AGR1.135 and AGR1.137 and can interact with the TM domains of CXCR4 without promoting biological effects. This statement has been clarified in the revised text.

      (13) Toxicity Testing:

      -Enhance the explanation of the approach to testing the toxicity of the compound in Jurkat cells. Consider incorporating positive controls to strengthen the assessment and clarify the experimental design.

      All the selected compounds in the in silico screening were initially tested for propidium iodide incorporation in treated cells in a toxicity assay, and some of them were discarded for further experiments (e.g., AGR1.103 and VSP3.1).

      Further evaluation of Jurkat cell viability was determined by cell cycle analysis using propidium iodide.  Supplementary Fig. 1B included the percentage of each cell cycle phase, and data indicated no significant differences between the treatments tested. Nevertheless, at the suggestion of the reviewer, and to clarify this issue, positive controls inducing Jurkat cell death (staurosporine and hydrogen peroxide) have also been included in the new Supplementary Fig. 2. The new figure also includes a table showing the percentage of cells in each cell-cycle phase.  

      (14) In the Results section concerning "AGR1.135 and AGR1.137 blocking CXCL12-mediated CXCR4 nanoclustering and dynamics", several points can be improved to enhance clarity and coherence: 1. Specificity of Low Molecular Weight Compounds:  

      -Clearly articulate how AGR1.135 and AGR1.137 specifically target homodimeric CXCR4 and provide an explanation for their lack of impact on heterodimeric CXCR4-CCR5 in that region.

      First of all, we should clarify that when we talk about receptor nanoclustering, oligomers refer to complexes including 3 or more receptors and, therefore, the residues involved in these interactions can differ from those involved in receptor dimerization. Moreover, our FRET experiments did not indicate that the compounds alter receptor dimerization (see new Supplementary Fig. 7). Of note, mutant receptors unable to oligomerize can still form dimers (Martínez-Muñoz L. et al. Mol. Cell 2018; García-Cuesta E.M .et al. Proc. Natl. Acad. Sci. USA 2022). Additionally, we believe that these oligomers can also include other chemokine receptors/proteins expressed at the cell membrane, which we are currently studying using different models and techniques.

      We have results supporting the existence of CCR5/CXCR4 heterodimers (Martínez-Muñoz L et al. Proc. Natl. Acad. Sci. USA 2014), in line with the data published by Di Marino et al. However, in the current study we have not evaluated the impact of the selected compounds on other CXCR4 complexes distinct from CXCR4 oligomers. Our Jurkat cells do not express CCR5 and, therefore, we cannot discuss whether AGR1.137 affects CCR5/CXCR4 heterodimers. The chemokine field is very complex and most receptors can form dimers (homo- and heterodimers) as well as oligomers (Martinez-Muñoz L., et al Pharmacol & Therap. 2011) when co-expressed. To evaluate different receptor combinations in the same experiment is a complex task, as the number of potential combinations between distinct expressed receptors makes the analysis very difficult. We started with CXCR4 as a model, to continue later with other possible CXCR4 complexes. In addition, for the analysis of CCR5/CXCR4 dynamics, it is much better to use dual-TIRF techniques, which allow the simultaneous detection of two distinct molecules coupled to different fluorochromes.

      Regarding the data of Di Marino et al., it is possible that the compounds might also affect heterodimeric conformations of CXCR4. This aspect has also been broached in the revised discussion. We would again note that we evaluated CXCR4 oligomers and not monomers or dimers; this is especially relevant when we compare the residues involved in these processes as they might differ depending on the receptor conformation considered. This issue was also hypothesized by Di Marino et al. (see our response to point 4).

      (15) When referring to "unstimulated" cells, provide a more detailed explanation to elucidate the experimental conditions and cellular state under consideration.

      Unstimulated cells refer to the cells in basal conditions, that is, cells in the absence of CXCL12. For TIRF-M experiments, transiently-transfected Jurkat cells were plated on glass-bottomed microwell dishes coated with fibronectin; these are the unstimulated cells. To observe the effect of the ligand, dishes were coated as above plus CXCL12 (stimulated cells). We have clarified this point in the material and methods section of the revised version.

      (16) 2. Paragraph Organization

      -Reorganize the second paragraph to eliminate redundancy and improve overall flow. A more concise and fluid presentation will facilitate reader comprehension and engagement.

      The second paragraph has been reorganized to improve overall flow.

      (17) Ensure that each paragraph contributes distinct information, avoiding repetition and redundancy.

      We have carefully revised each paragraph of the manuscript to avoid redundancy.

      (18) 3. Claim of Allosteric Antagonism:

      -Exercise caution when asserting that "AGR1.135 and AGR1.137 behave as allosteric antagonists of CXCR4" based on the presented results. Consider rephrasing to reflect that the observed effects suggest the potential allosteric nature of these compounds, acknowledging the need for further investigations and evidence.

      To avoid misinterpretations on the effect of the compounds on CXCR4, as we have commented in our response to point 2, we have substituted the term allosteric inhibitors with negative allosteric modulators, which refer to molecules that act by binding a site distinct from the orthosteric site, and selectively block some downstream signaling pathways, whereas others induced by the same endogenous or orthosteric agonist are unaffected (Gao Z.-G. & Jacobson K.A. Drug Discov. Today Technol. 2013). Our data indicate that the selected small compounds do not block ligand binding or G protein activation or receptor internalization, but inhibit receptor oligomerization and ligand-mediated directed cell migration.

      (19) In the Results section discussing the "incomplete abolition of CXCR4-mediated responses in Jurkat cells by AGR1.135 and AGR1.137", several points can be refined for better clarity and completeness:  1. Inclusion of Positive Controls: 

      -Consider incorporating positive controls in relevant experiments to provide a comparative benchmark for assessing the impact of AGR1.135 and AGR1.137. This addition will strengthen the interpretation of results and enhance the experimental rigor. 

      The in vivo experiments (Fig. 7E,F) used AMD3100, an orthosteric antagonist of CXCR4, as a positive control. We also included AMD3100, as a positive control of inhibition when evaluating the effect of the compounds on CXCL12 binding (Fig. 3, new Supplementary Fig. 3). The revised version of the manuscript also includes the effect of this inhibitor on other relevant CXCL12-mediated responses such as cell migration (Fig. 1B), receptor internalization (Fig. 3A), cAMP production (Fig. 3C), ERK1/2 and AKT phosphorylation (Supplementary Fig. 4), actin polymerization (Fig. 4A), cell polarization (Fig. 4B, C) and cell adhesion (Fig. 4D), to facilitate the interpretation of the results and improve the experimental rigor.

      (20) 2. Clarification of Terminology: 

      -Clarify the term "CXCR4 internalizes" by providing context, perhaps explaining the process of receptor internalization and its relevance to the study.

      We refer to CXCR4 internalization as a CXCL12-mediated endocytosis process that results in reduction of CXCR4 levels on the cell surface. We use CXCR4 internalization in this study with two purposes: First, for CXCR4 and other chemokine receptors, internalization processes are mediated by ligand-induced clathrin vesicles (Venkatesan et al 2003) a process that triggers CXCR4 aggregation in these vesicles. We have previously determined that the oligomers of receptors detected by TIRF-M remain unaltered in cells treated with inhibitors of clathrin vesicle formation and of internalization processes (Martinez-Muñoz L. et al. Mol. Cell 2018). Moreover, we have described a mutant CXCR4 that cannot form oligomers but internalizes normally in response to CXCL12 (Martinez-Muñoz L. et al. Mol. Cell 2018). The observation in this manuscript of normal CXCL12-mediated endocytosis in the presence of the negative allosteric inhibitors of CXCR4 that abrogate receptor oligomerization reinforces the idea that the oligomers detected by TIRF are not related to receptor aggregates involved in endocytosis; Second, receptor internalization is not affected by the allosteric compounds, indicating that they downregulate some CXCL12-mediated signaling events but not others (new Fig. 3).

      All these data have been included in the revised discussion of the manuscript.

      (21) Elaborate on the meaning of "CXCL12 triggers normal CXCR4mut internalization" to enhance reader understanding.

      We have previously described a triple-mutant CXCR4 (K239L/V242A/L246A; CXCR4mut). The mutant residues are located in the N-terminal region of TMVI, close to the cytoplasmic region, thus limiting the CXCR4 pocket described in this study (see our response to point 3). This mutant receptor dimerizes but neither oligomerizes in response to CXCL12 nor supports CXCL12-induced directed cell migration, although it can still trigger some Ca2+ flux and is internalized after ligand activation (Martinez-Muñoz L. et al. Mol. Cell 2018).  We use the behavior of this mutant (CXCR4mut) to show that the CXCR4 oligomers and the complexes involved in internalization processes are not the same and to explain why we evaluated CXCR4 endocytosis in the presence of the negative allosteric modulators.

      As we indicated in a previous answer to the reviewer, these issues have been re-elaborated in the revised version.

      (22) 3. Discrepancy in CXCL12 Concentration:

      -Address the apparent discrepancy between the text stating, "...were stimulated with CXCL12 (50 nM, 37{degree sign}C)," and the figure caption (Fig. 3A) reporting a concentration of 12.5 nM. Rectify this inconsistency and provide an accurate and clear explanation.

      We apologize for this error, which is now corrected in the revised manuscript. With the exception of the cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in the remaining experiments the optimal concentration of CXCL12 employed was 50 nM. These concentrations were optimized in previous works of our laboratory using the same type of experiment. We should also remark that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration of the ligand that is retained in the surface of the plates after the washing steps performed prior to adding the cells. In addition, we use 100 nM CXCL12 to create the gradient in the chambers used to perform the directed-cell migration experiments.

      (23) 4. Speculation on CXCL12 Binding:

      -Refrain from making speculative statements, such as "These data suggest that none of the antagonists alters CXCL12 binding to CXCR4," unless there is concrete evidence presented up to that point. Clearly outline the results that support this conclusion.

      Figure 3B and Supplementary Figure 3 show CXCL12-ATTO700 binding by flow cytometry in cells pretreated with the negative allosteric modulators. We have also included AMD3100, the orthosteric antagonist, as a control for inhibition. While these experiments showed no major effect of the compounds on CXCL12 binding, we cannot discard small changes in the affinity of the interaction between CXCL12 and CXCR4. In consequence we have re-written these statements.

      (24) 5. Corroboration of Data:

      -Specify where the corroborating data from immunostaining and confocal analysis are reported, ensuring readers can access the relevant information to support the conclusions drawn in this section.

      In agreement with the suggestion of the reviewer, the revised manuscript includes data from immunostaining and confocal analysis to complement Fig. 4B (new Fig. 4C). The revised version also includes some representative videos for the TIRF experiments showed in Figure 2 to clarify readability.

      (25) In the Results section concerning "AGR1.135 and AGR1.137 antagonists and their direct binding to CXCR4", several aspects need clarification and refinement for a more comprehensive and understandable presentation: 1. Workflow Clarification:

      -Clearly articulate the workflow used for assessing the binding of AGR1.135 and AGR1.137 to CXCR4. Address the apparent contradiction between the inability to detect a direct interaction and the utilization of Glide for docking in the TMV-TMVI cleft.

      To address the direct interaction of the compounds with CXCR4, we intentionally avoided the modification of the small compounds with different labels, which could affect their properties. We therefore attempted a fluorescence a spectroscopy strategy to formally prove the ability of the small compounds to bind CXCR4, but this failed because the AGR1.135 is yellow in color, which interfered with the determinations. We also tried a FRET strategy (see new Supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers when AGR1.135 was evaluated, but again the yellow color interfered with FRET determinations. Moreover, AGR1.137 did not modify FRET efficiency of CXCR4 dimers. Therefore, we were unable to detect the interaction of the compounds with CXCR4.

      We elected to develop an indirect strategy; in silico, we evaluated the binding-site using docking and molecular dynamics to predict the most promising CXCR4 binding residues involved in the interaction with the selected compounds. Next, we generated point mutant receptors of the predicted residues and re-evaluated the behavior of the allosteric antagonists in a CXCL12-induced cell migration experiment. Obviously, we first discarded those CXCR4 mutants that were not expressed on the cell membrane as well as those that were not functional when activated with CXCL12. Using this strategy, we eliminated the interference due to the physical properties of the compounds and demonstrated that if the antagonism of a compound is reversed in a particular CXCR4 mutant it is because the mutated residue participates or interferes with the interaction between CXCR4 and the compound, thus assuming (albeit indirectly) that the compound binds CXCR4. 

      To select the specific mutations included in the analysis, our strategy was to generate point mutations in residues present in the TMV-TMVI pocket of CXCR4 that were not directly proposed as critical residues involved in chemokine engagement, signal initiation, signal propagation, or G protein-binding, based on the extensive mutational study published by Wescott MP et. al. (Wescott M.P. et. al. Proc. Natl. Acad. Sci. U S A. 2016).

      (26) Provide a cohesive explanation of the transition from docking evaluation to MD analysis, ensuring a transparent representation of the methodology.

      Based on the aim of this work, the workflow shown in Author response image 2, was proposed to predict the binding mode of the selected molecules. Firstly, a CXCR4 model was generated to reconstruct some unresolved parts of the protein structure; then a binding site search using PELE software was performed to identify the most promising binding sites; subsequently, docking studies were performed to refine the binding mode of the molecules; and finally, molecular dynamics simulations were run to determine the most stable poses and predict the residues that we should mutate to test that the compounds interact with CXCR4. 

      Author response image 2.

      Workflow followed to determine the binding mode of the  studied compounds.

      (27) 2. Choice of Software and Techniques:

      -Justify the use of "AMBER14" and the PELE approach, considering  their potential obsolescence.

      These experiments were performed five years ago when the project was initiated. As the reviewer indicates, AMBER14 and PELE approaches might perhaps be considered obsolescent. Thus, we have predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and the sequence available under UniProt entry P61073. The complete analysis performed (see our response to point 4) confirmed that the compounds bound the selected pocket, as we had originally determined using PELE. These new analyses have been incorporated into the revised manuscript.

      (28)-Discuss the role of the membrane in the receptor-ligand interac7on. Elaborate on how the lipidic double layer may influence the binding of small compounds to GPCRs embedded in the membrane.

      Biological membranes are vital components of living organisms, providing a diffusion barrier that separates cells from the extracellular environment, and compartmentalizing specialized organelles within the cell. In order to maintain the diffusion barrier and to keep it electrochemically sealed, a close interaction of membrane proteins with the lipid bilayer is necessary. It is well known that this is important, as many membrane proteins undergo conformational changes that affect their transmembrane regions and that may regulate their activity, as seen with GPCRs (Daemen F.J. & Bonting S.L., Biophys. Struct. Mech. 1977; Gether U. et al. EMBO J. 1997). The lateral and rotational mobility of membrane lipids supports the sealing function while allowing for the structural rearrangement of membrane proteins, as they can adhere to the surface of integral membrane proteins and flexibly adjust to a changing microenvironment. In the case of the first atomistic structure of CXCR4 (Wu B. et al. Science 2010), it was indicated that for dimers, monomers interact only at the extracellular side of helices V and VI, leaving at least a 4-Å gap between the intracellular regions, which is presumably filled by lipids. In particular, they indicated that the channel between TMV and TMVI that connects the orthosteric chemokine binding pocket to the lipid bilayer is occupied by an oleic acid molecule. Recently, Di Marino et al., analyzing the dimeric structure of CXCR4, found a cholesterol molecule placed in between the two protomers, where it engages a series of hydrophobic interactions with residues located in the area between TMI and TMVI (Leu132, Val214, Leu216, Leu246, and Phe249). The polar head of cholesterol forms an H-bond with Tyr135 that further stabilizes its binding mode. This finding confirms that cholesterol might play an important role in mediating and stabilizing receptor dimerization, as seen in other GPCRs (Pluhackova, K., et al. PLoS Comput. Biol. 2016). In addition, we have previously observed that, independently of the structural changes on CXCR4 triggered by lipids, the local lipid environment also regulates CXCR4 organization, dynamics and function at the cell membrane and modulates chemokine-triggered directed cell migration. Prolonged treatment of T cells with bacterial sphingomyelinase promoted the complete and sustained breakdown of sphingomyelins and the accumulation of the corresponding ceramides, which altered both membrane fluidity and CXCR4 nanoclustering and dynamics. Under these conditions, CXCR4 retained some CXCL12-mediated signaling activity but failed to promote efficient directed cell migration (Gardeta S.R. et al. Front. Immunol. 2022). Collectively, these data demonstrate the key role that lipids play in the stabilization of CXCR4 conformations and in regulating its lateral mobility, influencing their associated functions. These considerations have been included in the revised version of the manuscript. 

      (29) 3. Stable Trajectories and Binding Mode Superimposi7on -Specify the criteria for defining "stable trajectories" to enhance reader understanding

      There could be several ways to describe the stability of a MD simulation, based on the convergence of energies, distances or ligand-target interactions, among others. In this work, we use the expression “stable trajectories” to refer to simulations in which the ligand trajectory converges and the ligand RMSD does not fluctuate more than 0.25Å. This definition is now included in the revised text.

      (30)  Clarify the meaning behind superimposing the two small compounds and ensure that the statement in the figure caption aligns with the information presented in the main text.

      We apologize for the error in the previous Fig. 5A and in its legend. The figure was created by superimposing the protein component of the poses for the two compounds, AGR1.135 and AGR1.137, rather than the compounds themselves. As panel 5A was confusing, we have modified all Fig. 5 in the revised manuscript to improve clarity.

      (31) 4. Volume Analysis and Distances:

      -Provide details on how the volume analysis was computed and how distances were accounted for. Consider adding a figure to illustrate these analyses, aiding reader comprehension.

      The cleft search and analysis were performed using the default settings of SURFNET (Laskowski R.A. J. Mol. Graph. 1995) included in the PDBsum server (Laskowski R.A. et. al. Trends Biochem. Sci. 1997). The first run of the input model for CXCR4 3ODU identified a promising cleft of 870 Å3 in the lower half of the region flanked by TMV and TMVI, highlighting this area as a possible small molecule binding site (Fig. I, only for review purposes). Analysis of the cleft occupied by AGR1.135 showed two independent cavities of 434 Å3 and 1381 Å3 that were not connected to the orthosteric site. The same procedure for AGR1.137 revealed two distinct clefts of 790 Å3 and 580 Å3, respectively (Fig. I, only for review purposes). Analysis of the atomic distances between the protein residues and the compounds was performed using the PISA server. Krissinel E. & Henrick K. J. Mol. Biol. 2007). (Please see our response to point 3 and the corresponding figure).

      (32) 5. Mutant Selection and Relevance:

      -Clarify the rationale behind selecting the CXCR4 mutants used in the study. Consider justifying the choice and exploring the possibility of performing an alanine (ALA) scan for a more comprehensive mutational analysis.  

      The selection of the residues to be mutated along the cleft was first based on their presence in the proposed cleft and the direct interaction of the compounds with them, either by hydrogen bonding or by hydrophobic interactions. Secondly, all mutated residues did not belong to any of the critical residues involved in transmitting the signal generated by the interaction of CXCL12 with the receptor. In any case, mutants producing a non-functional CXCR4 at the cell membrane were discarded after FACS analysis and chemotaxis experiments. Finally, the length and nature of the resulting mutations were designed mainly to occlude the cleft in case of the introduction of long residues such as lysines (I204K, L208K) or to alter hydrophobic interactions by changing the carbon side chain composition of the residues in the cleft. Indeed, we agree that the alanine scan mutation analysis would have been an alternative strategy to evaluate the residues involved in the interactions of the compounds. 

      (33) Reevaluate the statement regarding the relevance of the Y256F muta7on for the binding of AGR1.137. If there is a significant impact on migra7on in the mutant (Fig. 6B), elaborate on the significance in the context of AGR1.137 binding.

      In the revised discussion we provide more detail on the relevance of Y256F mutation for the binding of AGR1.137 as well as for the partial effect of G207I and R235L mutations. The predicted interactions for each compound are depicted in new Fig. 6 C, D after LigPlot+ analysis (Laskowski R.A. & Swindells M.B. J. Chem. Inf. Model. 2011), showing that AGR1.135 interacted directly with the receptor through a hydrogen bond with Y256. When this residue was mutated to F, one of the anchor points for the compound was lost, weakening the potential interaction in the region of the upper anchor point.

      It is not clear how the Y256F mutation will affect the binding of AGR1.137, but other potential contacts cannot be ruled out since that portion of the compound is identical in both AGR1.135 and AGR1.137. This is especially true for its neighboring residues in the alpha helix, F249, L208, as shown in 3ODU structure (Fig. 6D), which are shown to be directly implicated in the interaction of both compounds. Alternatively, we cannot discard that Y256 interacts with other TMs or lipids stabilizing the overall structure, which could reverse the effect of the mutant at a later stage (Author response image 3).

      Author response image 3.

      Cartoon representation of Y256 and its intramolecular interactions in the CXCR4 Xray solved structure 3ODU. TMV helix is colored in blue and TMVI in pink.

      (34) Address the apparent discrepancy in residue involvement between AGR1.135 and AGR1.137, particularly if they share the same binding mode in the same clef.

      AGR1.135 and AGR1.137 exhibit comparable yet distinct binding modes, engaging with CXCR4 within a molecular cavity formed by TMV and TMVI. AGR1.135 binds to CXCR4 through three hydrogen bonds, two on the apical side of the compound that interact with residues TMV-G207 and TMVI-Y256 and one on the basal side that interacts with TMVI-R235 (Fig. 5A). This results in a more extended and rigid conformation when sharing hydrogen bonds, with both TMs occupying a surface area of 400 Å2 and a length of 20 Å in the cleft between TMV and TMVI (Supplementary Fig. 8A). AGR1.137 exhibits a distinct binding profile, interacting with a more internal region of the receptor. This interaction involves the formation of a hydrogen bond with TMIIIV124, which induces a conformational shift in the TMVI helix towards an active conformation (Fig. 5B; Supplementary Fig. 13). Moreover, AGR1.137 may utilize the carboxyl group of V124 in TMIII and overlap with AGR1.135 binding in the cavity, interacting with the other 19 residues dispersed between TMV and VI to create an interaction surface of 370 Å2 along 20 Å (Supplementary Fig. 8B). This is illustrated in the new Fig. 5B. AGR1.137 lacks the phenyl ring present in AGR1.135, resulting in a shorter compound with greater difficulty in reaching the lower part of TMVI where R235 sits. 

      Author response image 4.

      AGR1.135 and AGR1.137 interaction with TMV and TMVI.  The model shows the location of the compounds within the TMV-VI cleft, illustrated by a ribbon and stick representation. The CXCR4 segments of TMV and TMVI are represented in blue and pink ribbons respectively, and side chains for some of the residues defining the cavity are shown in sticks. AGR1.135 and AGR1.137 are shown in stick representation with carbon in yellow, nitrogen in blue, oxygen in red, and fluorine in green. Hydrogen bonds are indicated by dashed black lines, while hydrophobic interactions are shown in green. The figure reproduces the panels A, B of Fig. 5 in the revised manuscript.

      (35) In the Results sec7on regarding "AGR1.137 treatment in a zebrafish xenograf model", the following points can be refined for clarity and completeness: 1. Cell Line Choice for Zebrafish Xenograft Model:

      -Explain the rationale behind the choice of HeLa cells for the zebrafish xenograft model when the previous experiments primarily focused on Jurkat cells. Address any specific biological or experimental considerations that influenced this decision.

      As far as we know, there are no available models of tumors in zebrafish using Jurkat cells. We looked for a tumoral cell system that expresses CXCR4 and could be transplanted into zebrafish. HeLa cells are derived from a human cervical tumor, express a functional CXCR4, and have been previously used for tumorigenesis analyses in zebrafish (Brown H.K. et al. Expert Opin. Drug Discover. 2017; You Y. et al Front. Pharmacol. 2020). These cells grow in the fish and disseminate through the ventral area and can be used to determine primary tumor growth and metastasis. Nonetheless, we first analyzed in vitro the expression of a functional CXCR4 in these cells (Supplementary Fig. 10A), whether AGR1.137 treatment specifically abrogated CXCL12-mediated direct cell migration (Fig. 7A, B), as whether it affected cell proliferation (Supplementary Fig. 10B). As HeLa cells reproduce the in vitro effects detected for the compounds in Jurkat cells, we used this model in zebrafish. These issues were already discussed in the first version of our manuscript. 

      (36) 2. Toxicity Assessment in Zebrafish Embryos: 

      -Clarify the basis for stating that AGR1.137 is not toxic to zebrafish embryos. Consider referencing the Zebrafish Embryo Acute Toxicity Test (ZFET) and provide relevant data on lethal concentration (LC50) and non-lethal toxic phenotypes such as pericardial edema, head and tail necrosis, malformation, brain hemorrhage, or yolk sac edema.

      Tumor growth and metastasis kinetics within the zebrafish model have been extensively evaluated in many publications (White R. et al. Nat. Rev. Cancer. 2013; Astell K.R. and Sieger D. Cold Spring Harb. Perspect. Med. 2020; Chen X. et al. Front. Cell Dev. Biol. 2021; Weiss JM. Et al. eLife 2022; Lindhal G. et al NPJ Precis. Oncol. 2024). Our previous experience using this model shows that tumors start having a more pronounced proliferation and lower degree of apoptosis from day 4 onwards, but we cannot keep the tumor-baring larvae for that long due to ethical reasons and also because we don’t see much scientific benefit of unnecessarily extending the experiments. Anti-proliferative or pro-apoptotic effects of drugs can still be observed within the three days, even if this is then commonly seen as larger reduction (instead of a smaller growth as it is commonly seen in for example mouse tumor models) compared to controls. Initially we characterized the evolution of implanted tumors in our system and how much they metastasize over time in the absence of treatment before to test the compounds (Author response image 5).

      The in vivo experiments were planned to validate efficacious concentrations of the investigated drugs rather than to derive in vivo IC50 or other values, which require testing of multiple doses. We have, however, included an additional concentration to show concentration-dependence and therefore on-target specificity of the drugs in the revised version of the manuscript (data also being elaborated in ongoing experiments). At this stage, we believe that adding the LC50 does not provide interesting new knowledge, and it is standard to only show results from the experimental endpoint (in our case 3 days post implantation). We agree that showing these new data points strengthens the manuscript and facilitates independent evaluation and conclusions to be drawn from the presented data. We have created new graphs where datapoints for each compound dose are shown.  

      Author response image 5.

      Evolution of the tumors and metastasis along the time in the absence of any treatment. HeLa cells were labeled with 8 µg/mL Fast-DiI™ oil and then implanted in the dorsal perivitelline space of 2-days old zebrafish embryos. Tumors were imaged within 2 hours of implantation and re-imaged each 24 h for three days. Changes in tumor size was evaluated as tumor area at day 1, 2 and 3 divided by tumor area at day 0, and metastasis was evaluated as the number of cells disseminated to the caudal hematopoietic plexus at day 1, 2 and 3 divided by the number of cells at day  3.

      Regarding the statement that AGR1.137 was not toxic, this was based on visual inspection of the zebrafish larvae at the end of the experiment, which also revealed a lack of drug-related mortality in these experiments. There are a number of differences in how our experiment was run compared with the standardized ZFET. ZFET evaluates toxicity from 0 hours post-fertilization to 1 or 2 days post-fertilization, whereas here we exposed zebrafish from 2 days post-fertilization to 5 days post-fertilization. The ZFET furthermore requires that the embryos are raised at 26ºC whereas kept the temperature as close as possible to a physiologically relevant temperature for the tumor cells (36ºC). In the ZFET, embryos are incubated in 96-well plates whereas for our studies we required larger wells to be able to manipulate the larvae and avoid well edge-related imaging artefacts, and we therefore used 24-well plates. As such, the ZFET was for various reasons not applicable to our experimental settings. As we were not interested in rigorously determining the LD50 or other toxicity-related measurements, as our focus was instead on efficacy and we found that the targeted dose was tolerated, we did not evaluate multiple doses, including lethal doses of the drug, and are therefore not able to determine an LD50/LC50. We also did not find drug-induced non-lethal toxic phenotypes in this study, and so we cannot elaborate further on such phenotypes other than to simply state that the drug is well tolerated at the given doses. Therefore, the reference to ZFET in the manuscript was eliminated.

      (37) If supplementary information is available, consider providing it for a comprehensive understanding of toxicity assessments. 

      The effective concentration used in the zebrafish study was derived from the in vitro experiments. That being said, and as elaborated in our response to comment 36, we have added data for one additional dose to show the dose-dependent regulation of tumor growth and metastasis. 

      (38) 3. Optimization and Development of AGR1.137: 

      -Justify the need for further optimization and development of AGR1.137 if it has a comparable effect to AMD3100. Explain the specific advantages or improvements that AGR1.137 may offer over AMD3100. 

      AGR1.137 is highly hydrophobic and is very difficult to handle, particularly in in vivo assays; thus, for the negative allosteric modulators to be used clinically, it would be very important to increase their solubility in water. Contrastingly, AMD3100 is a water-soluble compound. Before using the zebrafish model, we performed several experiments in mice using AGR1.137, but the inhibitory results were highly variable, probably due to its hydrophobicity. We also believe that it would be important to increase the affinity of AGR1.137 for CXCR4, as the use of lower concentrations of the negative allosteric modulator would limit potential in vivo side effects of the drug. On the other hand, we are also evaluating distinct administration alternatives, including encapsulation of the compounds in different vehicles. These alternatives may also require modifications of the compounds. 

      AMD3100 is an orthosteric inhibitor and therefore blocks all the signaling cascades triggered by CXCL12. For instance, we observed that AMD3100 treatment blocked CXCL12 binding, cAMP inhibition, calcium flux, cell adhesion and cell migration (Fig. 3, Fig. 4), whereas the effects of AGR1.137 were restricted to CXCL12-mediated directed cell migration. Although AMD3100 was well tolerated by healthy volunteers in a singledose study, it also promoted some mild and reversible events, including white blood cells count elevations and variations of urine calcium just beyond the reported normal range (Hendrix C.W. et al. Antimicrob. Agents Chemother. 2000). To treat viral infections, continuous daily dosing requirements of AMD3100 were impractical due to severe side effects including cardiac arrhythmias (De Clercq E. Front Immunol. 2015). For AMD3100 to be used clinically, it would be critical to control the timing of administration. In addition, side effects after long-term administration have potential problems. Shorter-term usage and lower doses would be fundamental keys to its success in clinical use (Liu T.Y. et al. Exp. Hematol. Oncol. 2016). The use of a negative allosteric modulator that block cell migration but do not affect other signaling pathways triggered by CXCL12 would be, at least in theory, more specific and produce less side effects. These ideas have been incorporated into the revised discussion to reflect potential advantages or improvements that AGR1.137 may offer over AMD3100.

      (39) 4. Discrepancy in AGR1.137 and AMD3100 Effects:

      -Discuss the observed discrepancy where AGR1.137 exhibits similar effects to AMD3100 but only after 48 hours. Provide insights into the temporal dynamics of their actions and potential implications for the experimental design.

      Images and data shown in Fig. 7E, F correspond to days 0 and 3 after HeLa cell implantation (tumorigenesis) and only to day 3 in the case of metastasis data. The revised version contains the effect of two distinct doses of the compounds (10 and 50 µM, for AGR1.135 and AGR1.137 and 1 and 10 µM for AMD3100). 

      (40) In the "Discussion" section, there are several points that require clarifica7on and refinement to enhance the overall coherence and depth of the analysis:  1. Reduction of Side-Effects: 

      -Provide a more detailed explanation of how the identified compounds, specifically AGR1.135 and AGR1.137, contribute to the reduction of side effects. Consider discussing specific mechanisms or characteristics that differentiate these compounds from existing antagonists.

      The sentence indicating that AGR1.135 and AGR1.137 contribute to reduce side effects is entirely speculative, as we have no experimental evidence to support it. We have therefore corrected this in the revised version. The origin of the sentence was that orthosteric antagonists typically bind to the same site as the endogenous ligand, thus blocking its interaction with the receptor. Therefore, orthosteric inhibitors (i.e. AMD3100) block all signaling cascades triggered by the ligand and therefore their functional consequences. However, the compounds described in this project are essentially negative allosteric modulators, that is, they bind to a site distinct from the orthosteric site, inducing a conformational change in the receptor that does not alter the binding of the endogenous ligand, and therefore block some specific receptor-associated functions without altering others. We observed that AGR1.137 blocked receptor oligomerization and directed cell migration whereas CXCL12 still bound CXCR4, triggered calcium mobilization, did not inhibit cAMP release or promoted receptor internalization. This is why we speculated on the limitation of side effects. The statements have been nonetheless revised in the new version of the manuscript.

      (41) 2. Binding Site Clarification:

      -Address the apparent discrepancy between docking the small compounds in a narrow cleft formed by TMV and TMVI helices and the statement that AGR1.131 binds elsewhere. Clarify the rationale behind this assertion

      After the in silico screening, a total of 40 compounds were selected.  These compounds showed distinct degrees of interaction with the cleft formed by TMV and TMVI and even with other potential interaction sites on CXCR4, with the exception of the ligand binding site according to the data described by Wescott et al. (PNAS 2016 113:9928-9933), as this possibility was discarded in the initial approach of the in silico screening. According to PELE analysis, AGR1.131 was one of the 40 selected compounds that showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 through the selected area for the screening. It nonetheless also showed a best pose placed between helices TMI and TMVII, -43.7 kcal/mol. In any case, the compound was included in the biological screening, where it was unable to impact CXCL12-mediated chemotaxis (Fig. 1B). We then focused on AGR1.135 and AGR1.137, as showed a higher inhibitory effect on CXCL12-mediated migration, and on AGR1.131 as an internal negative control. AGR1.131 has a skeleton very similar to the other compounds (Fig. 1C) and can interact with the TM domains of CXCR4 without promoting effects. None of the three compounds affected CXCL12 binding, or CXCL12mediated inhibition of cAMP release, or receptor internalization. However, whereas AGR1.135 and AGR1.137, blocked CXCL12-mediated CXCR4 oligomerization and directed cell migration towards CXCL12 gradients, AGR1.131 had no effect in these experiments (Fig. 3, Fig.  4). 

      Next, we performed additional theoretical calculations (PELE, docking, MD) to inspect in detail the potential binding modes of active and inactive molecules. Based on these additional calculations, we identified that whereas AGR1.135 and AGR1.137 showed preferent binding on the molecular pocket between TMV and TMVI, the best pose for AGR1.131 was located between TMI and TMVII, as the initial experiments indicated.  These observations and data have been clarified in the revised discussion. 

      (42) 3. Impact of Chemical Modifications:

      -Discuss the consequences of the distinct chemical groups in AGR1.135, AGR1.137, and AGR1.131, specifically addressing how variations in amine length and chemical nature may influence binding affinity and biological activity. Provide insights into the potential effects of these modifications on cellular responses and the observed outcomes in zebrafish. 

      The main difference between AGR1.131 and the other two compounds is the higher flexibility of AGR1.131 due to the additional CH2 linker, together with the lack of a piperazine ring. The additional CH2 linking the phenyl ring increases the flexibility of AGR1.131 when compared with AGR1.135 and AGR1.137, and the absence of the piperazine ring might be responsible for its lack of activity, as it makes this compound able to bind to CXCR4 (Fig. 1C).

      AGR1.137 was chosen in a second round. The additional presence of the tertiary amine (in the piperazine ring) allows the formation of quaternary ammonium salts in the aqueous medium and its substituents to increase its solubility (Fig 1C). This characteristic might be related to the absence of toxic effects of the compound in the zebrafish model.

      (43) 4. Existence of Distinct CXCR4 Conformational States: 

      -Provide more detailed support for the statement suggesting the "existence of distinct CXCR4 conformational states" responsible for activating different signaling pathways. Consider referencing relevant studies or experiments that support this claim.

      Classical models of GPCR allostery and activation, which describe an equilibrium between a single inactive and a single signaling-competent active conformation, cannot account for the complex pharmacology of these receptors. The emerging view is that GPCRs are highly dynamic proteins, and ligands with varying pharmacological properties differentially modulate the balance between multiple conformations.

      Just as a single photograph from one angle cannot capture all aspects of an object in movement, no one biophysical method can visualize all aspects of GPCR activation. In general, there is a tradeoff between high-resolution information on the entire protein versus dynamic information on limited regions. In the former category, crystal and cryo-electron microscopy (cryoEM) structures have provided comprehensive, atomic-resolution snapshots of scores of GPCRs both in inactive and active conformations, revealing conserved conformational changes associated with activation. However, different GPCRs vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Spectroscopic and computational approaches provide complementary information, highlighting the role of conformational dynamics in GPCR activation (Latorraca N.R.V. et al. Chem. Rev 2017). In the absence of agonists, the receptor population is typically dominated by conformations closely related to those observed in inactive-state crystal structures (Manglik A. et al. Cell 2015). While agonist binding drives the receptor population towards conformations similar to those in activestate structures, a mixture of inactive and active conformations remains, reflecting “loose” or incomplete allosteric coupling between the orthosteric and transducer pockets (Dror R.O. et al. Proc. Natl. Acad. Sci. USA 2011). Surprisingly, for some GPCRs, and under some experimental conditions, a substantial fraction of unliganded receptors already reside in an active-like conformation, which may be related to their level of basal or constitutive signaling (Staus D.P. et al. J. Biol. Chem. 2019);  Ye L. et al. Nature 2016).  In our case, the negative allosteric modulators, (Staus DP, et al. J. Biol. Chem 2019); Ye L. et al. Nature 2016) did not alter ligand binding and had only minor effects on specific CXCL12-mediated functions such as inhibition of cAMP release or receptor internalization, among others, but failed to regulate CXCL12-mediated actin dynamics and receptor oligomerization. Collectively, these data suggest that the described compounds alter the active conformation of CXCR4 and therefore support the presence of distinct receptor conformations that explain a partial activation of the signaling cascade.

      All these observations are now included in the revised discussion of the manuscript.

      (44) 5. Equilibrium Shift and Allosteric Ligands: 

      -Clarify the statement about "allosteric ligands shifting the equilibrium to favor a particular receptor conformation". Support this suggestion with references or experimental evidence

      In a previous answer (see our response to point 2), we explain why we define the compounds as negative allosteric modulators. These compounds do not bind the orthosteric binding site or a site distinct from the orthosteric site that alters the ligand-binding site. Their effect should be due to changes in the active conformation of CXCR4, which allow some signaling events whereas others are blocked. Our functional data thus support that through the same receptor the compounds separate distinct receptor-mediated signaling cascades, that is, our data suggest that CXCR4 has a conformational heterogeneity. It is known that GPCRs exhibit more than one “inactive” and “active” conformation, and the endogenous agonists stabilize a mixture of multiple conformations. Biased ligands or allosteric modulators can achieve their distinctive signaling profiles by modulating this distribution of receptor conformations. (Wingler L.M. & Lefkowitz R.J. Trends Cell Biol. 2020). For instance, some analogs of angiotensin II do not appreciably activate Gq signaling (e.g., increases in IP3 and Ca2+) but still induce receptor phosphorylation, internalization, and mitogen-activated protein kinase (MAPK) signaling (Wei H, et al. Proc. Natl. Acad. Sci. USA 2003). Some of these ligands activate Gi and G12 in bioluminescence resonance energy transfer (BRET) experiments (Namkung Y. et al. Sci. Signal. 2018). A similar observation was described in the case of CCR5, where some chemokine analogs promoted G protein subtype-specific signaling bias (Lorenzen E. et al. Sci. Signal 2018). Structural analysis of distinct GPCRs in the presence of different ligands vary considerably in the magnitude and nature of the conformational changes in the orthosteric ligand-binding site following agonist binding (Venkatakrishnan A.J.V. et al. Nature 2016). Yet, these changes modify conserved motifs in the interior of the receptor core and induce common conformational changes in the intracellular site involved in signal transduction. That is, these modifications might be considered distinct receptor conformations. 

      The revised discussion contains some of these interpretations to support our statement about the stabilization of a particular receptor conformation triggered by the negative allosteric modulators. 

      (45) 6. Refinement of Binding Mode: 

      -Clarify the workflow for obtaining the binding mode, particularly the role of GLIDE and PELE. Clearly explain how these software tools were used in tandem to refine the binding mode. 

      The computational sequential workflow applied in this project included, i) Protein model construction, ii) Virtual screening (Glide), iii) PELE, iv) Docking (AutoDock and Glide) and v) Molecular Dynamics (AMBER).

      Glide was applied for the structure-based virtual screening to explore which compounds could fit and interact with the previously selected binding site.

      After the identification of theoretically active compounds (modulators of CXCR4), additional calculations were done to identify a potential binding site. PELE was used in this sense, to study how the compounds could bind in the whole surface of the target (TMV-TMVI). By applying PELE, we avoided biasing the calculation, and we found that the trajectories with better interaction energies identified the cleft between TMV and TMVI as the binding site for AGR1.135 and AGR1.137, and not for AGR1.131. AGR1.131 showed a pose with low binding energy, -39.8 kcal/mol, between TMV and TMVI helices, that is, it might interact with CXCR4 in the selected area for the screening. But it also showed a better pose placed between helices TMI and TMVII, - 43.7 kcal/mol (see our response to point 41). These data have been now confirmed using Schrodinger’s MM-GBSA procedure (see our response to points 6 and 8). In any case, the compound was included in the biological screening, where it was unable to affect CXCL12-mediated chemotaxis (Fig. 1B). Docking and MD simulations were then performed to study and refine the specific binding mode in this cavity. These data were important to choose the mutations on CXCR4 required, to test whether the compounds reversed its behavior. In these experiments we also confirmed that AGR1.131 had a better pose on the TMI-TMVII region. 

      (46) 7. Impact of Compound Differences on CXCR4-F249L mutant: 

      -Provide visual aids, such as figures, and additional experiments to support the statement about differences in the behavior of AGR1.135 and AGR1.137 on cells expressing CXCR4-F249L mutant. Elaborate on the closer interaction suggested between the triazole group of AGR1.137 and the F249 residue

      At the reviewer’s suggestion, Fig. 5 has been modified to incorporate a closer view of the interactions identified and new panels in new Fig. 6 have been added to show in detail the effect of the mutations selected on the structure of the cleft between TMV and TMVI. The main difference between AGR1.135 and AGR1.137 is how the triazole group interacts with F249 and L216 (Author response image 6). In AGR1.137, the three groups are aligned in a parallel organization, which appears to be more effective: This might be due to a better adaptation of this compound to the cleft since there is only one hydrogen bond with V124. In AGR1.135, the compound interacts with the phenyl ring of F249 and has a stronger interaction at the apical edge to stabilize its position in the cleft. However, there is still an additional interaction present. When changing F249

      Author response image 6.

      Cartoon representation of the interaction of CXCR4 F249L mutant with AGR1.135 (A) and AGR1.137 (B). The two most probable conformations of Leucine rotamers are represented in cyan A and B conformations. Van der Waals interactions are depicted in blue cyan dashed lines, hydrogen bonds in black dashed lines. CXCR4 segments of TMV and TMVI are colored in blue and pink, respectively

      to L (Fig. VIIA, B, only for review purposes) and showing the two most likely rotamers resulting from the mutation, it is observed that rotamer B is in close proximity to the compound, which may cause the binding to either displace or adopt an alternative conformation that is easier to bind into the cleft. As previously mentioned, it is likely that AGR1.135 can displace the mutant rotamer and bind into the cleft more easily due to its higher affinity.

      (47) In the "Materials and Methods" section, the computational approach for the "discovery of CXCR4 modulators" requires significant revision and clarification. The following suggestions aim to address the identified issues: 1. Structural Modeling: 

      -Reconsider the use of SWISS-MODEL if there is an available PDB code for the entire CXCR4 structure. Clearly articulate the rationale for choosing one method over the other and explain any limitations associated with the selected approach. 

      The SWISS-model server allows for automated comparative modeling of 3D protein structures that was pioneered in the fields of automated modeling. At the time we started this project. it was the most accurate method to generate reliable 3D protein structure models.

      As explained above, we have now predicted the structure of the target using AlphaFold (Jumper J. et al, Nature 2021) and performed several additional experiments that confirm that the small compounds bind the selected pocket as the original strategy indicated (see our response to point 6). (Fig. II, only for review purposes).

      (48) 2. Parametriza7on of Small Compounds: 

      -Provide a detailed description of the parametrization process for the small compounds used in the study. Specify the force field and parameters employed, considering the obsolescence of AMBER14 and ff14SB. Consider adopting more contemporary force fields and parameterization strategies. 

      When we performed these experiments, some years ago, the force fields applied (ff14SB, AMBER14 used in MD or OPLS2004 in docking with Glide) were well accepted and were gold standards. It is, however, true that the force fields have evolved in the past few years, Moreover, in the case of the MD simulations, to consider the parameters of the ligands that are not contained within the force field, we performed an additional parameterization as a standard methodology. We then generated an Ab initio optimization of the ligand geometry, defining as basis sets B3LYP 6-311+g(d), using Gaussian 09, Revision A.02, and then a single point energy calculation of ESP charges, with HF 6311+g(d) on the optimized structure. As the last step of the parametrization, the antechamber module was used to adapt these charges and additional parameters for MD simulations.

      (49) 3. Treatment of Lipids and Membrane: 

      -Elaborate on how lipids were treated in the system. Clearly describe whether a membrane was included in the simulations and provide details on its composition and structure. Address the role of the membrane in the study and its relevance to the interactions between CXCR4 and small compounds 

      To stabilize CXCR4 and more accurately reproduce the real environment in the MD simulation, the system was embedded in a lipid bilayer using the Membrane Builder tool (Sunhwan J. et al. Biophys. J. 2009) from the CHARMM-GUI server. The membrane was composed of 175 molecules of the fatty acid 1-palmitoyl-2-oleoyl-sn-glycero-3phosphocholine (POPC) in each leaflet. The protein-membrane complex was solvated with TIP3 water molecules. Chloride ions were added up to a concentration of 0.15 M in water, and sodium ions were added to neutralize the system. This information was previously described in detail.

      (50) 4. Molecular Dynamics Protocol: 

      -Provide a more detailed and coherent explanation of the molecular dynamics protocol. Clarify the specific steps, parameters, and conditions used in the simulations. Ensure that the protocol aligns with established best practices in the field.

      Simulations were calculated on an Asus 1151 h170 LVX-GTX-980Ti workstation, with an Intel Core i7-6500 K Processor (12 M Cache, 3.40 GHz) and 16 GB DDR4 2133 MHz RAM, equipped with a Nvidia GeForce GTX 980Ti available for GPU (Graphics Processing Unit) computations. MD simulations were performed using AMBER14 (Case D.A. et al. AMBERT 14, Univ. of California, San Francisco, USA, 2014) with ff14SB (Maier J.A. et al. J. Chem. Theory Comput. 2015) and lipid14 (Dickson C. J. et al. J. Chem. Theory Comput. 2014) force fields in the NPT thermodynamic ensemble (constant pressure and temperature). Minimization was performed using 3500 Steepest Descent steps and 4500 Conjugate Gradient steps three times, firstly considering only hydrogens, next considering only water molecules and ions, and finally minimizing all atoms. Equilibration raises system temperature from 0 to 300 K at a constant volume fixing everything but ions and water molecules. After thermalization, several density equilibration phases were performed. In the production phase, 50 ns MD simulations without position restraints were calculated using a time step of 2 fs. Trajectories of the most interesting poses were extended to 150 ns. All bonds involving hydrogen atoms were constrained with the SHAKE algorithm (Lippert R.A. et al. J. Chem. Phys. 2007). A cutoff of 8 Å was used for the Lennard-Jones interaction and the short-range electrostatic interactions. Berendsen barostat (Berendsen H.J. et al. J. Chem. Phys.  1984) and Langevin thermostat were used to regulate the system pression and temperature, respectively. All trajectories were processed using CPPTRAJ (Roe D.R. & Cheatham III T.E. J. Chem. Theory Comput. 2013) and visualized with VMD (Visual Molecular Dynamics) (Humphrey W. et al. J. Mol. Graphics. 1996). To reduce the complexity of the data, Principal Component Analysis (PCA) was performed on the trajectories using CPPTRAJ.

      (51) Consider updating the molecular dynamics protocol to incorporate more contemporary methodologies, considering advancements in simulation techniques and software.

      In our answer to points 6 and 47, we describe why we use the technology based on Swiss-model and PELE analysis and how we have now used Alphafold and other more contemporary methodologies to confirm that the small compounds bind the selected pocket.

      (52) Figure 1A: 

      •  Consider switching to a cavity representation for CXCL12 to enhance clarity and emphasize the cleft.

      Fig. 1A has been modified to emphasize the cleft.

      (53) Explicitly show the TMV-TMVI cleft in the figure for a more comprehensive visualization. 

      In Fig. 1A we have added an insert to facilitate TMV-TMVI visualization.

      (54) Figure 1B: 

      •  Clearly explain the meaning of the second DMSO barplot to avoid confusion. 

      To clarify this panel, we have modified the figure and the figure legend. Panel B now includes a complete titration of the three compounds analyzed in the manuscript.  The first bar shows cell migration in the absence of both treatment with AMD3100 and stimulation with CXCL12.  The second bar shows migration in response to CXCL12 in the absence of AMD3100. The third bar shows the effect of AMD3100 on CXCL12-induced migration, as a known control of inhibition of migration.  We hope that this new representation of the data results is clearer.

      (55) Figure 1C: 

      •  Provide a clear legend explaining the significance of the green shading on the small compounds. 

      The legend for Fig. 1C has been modified accordingly to the reviewer’s suggestion.

      (56) Figure 2: 

      •  Elaborate on the role of fibronectin in the experiment and explain the specific contribution of CD86-AcGFP.

      The ideal situation for TIRF-M determinations is to employ cells on a physiological substrate complemented with or without chemokines. Fibronectin is a substrate widely used in different studies that allows cell adhesion, mimicking a physiological situation. Jurkat cells express alpha4beta1 and alpha5beta1 integrins that mediate adhesion to fibronectin (Seminario M.C. et al. J. Leuk. Biol. 1999).

      Regarding the use of CD86-AcGFP in TIRF-M experiments. We currently determine the number of receptors in individual trajectories of CXCR4 using, as a reference, the MSI value of CD86-AcGFP that strictly showed a single photobleaching step (Dorsch S. et al. Nat Methods 2009).

      We preferred to use CD86-AcGFP in cells instead of AcGFP on glass, to exclude any potential effect on the different photodynamics exhibited by AcGFP when bound directly to glass. In any case, this issue has been clarified in the revised version.

      (57) Figure 3D: 

      •  Include a plot for the respective band intensity to enhance data presentation 

      The plot showing the band intensity analysis of the experiments shown in Fig. 3D was already included in the original version (see old Supplementary Fig. 3). However, in the revised version, we include these plots in the same figure as panels 3E and 3F.  As a control of inhibition of CXCL12 stimulation, we have also included a new figure (Supplementary Fig. 4) showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (58) Consider adding AMD3100 as a control for comparison. 

      In agreement with the reviewer’s suggestion, we have added the effect of AMD3100 in most of the functional experiments performed.

      (59) Figure 4: 

      •  Address the lack of positive controls in Figure 4 and consider their inclusion for a more comprehensive analysis. 

      DMSO bars correspond to the control of the experiment, as they represent the effect of CXCL12 in the absence of any allosteric modulator. As previously described in this point-by-point reply, DMSO bars correspond to the control performed with the solvent with which the small compounds, at maximum concentration, are diluted.  Therefore, they show the effect of the solvent on CXCL12 responses. In any case, and in order to facilitate the comprehension of the figure we have also added the controls in the absence of DMSO to demonstrate that the solvent does not affect CXCL12-mediated functions, together with the effect of the orthosteric inhibitor AMD3100. In addition, we have also included representative images of the effect of the different compounds on CXCL12-induced polarization (Fig. 4C).

      (60) In Figure 4A, carefully assess overlapping error bars and ensure accurate interpreta7on. If necessary, consider alternative representation. 

      We have tried alternative representations of data in Fig. 4A, but in all cases the figure was unclear. We believe that the way we represent the data in the original manuscript is the most clear and appropriate.  Nevertheless, we have now included significance values as a table annexed to the figure, as well as the effect of AMD3100, as a control of inhibition

      (61) Supplementary Figure 1A: 

      •  Improve the clarity of bar plots for better understanding. Consider reordering them from the most significant to the least. 

      This was a good idea, and therefore Supplementary Fig. 1A has been reorganized to improve clarity.

      (62) Supplementary Figure 1C: 

      •  Clarify the rationale behind choosing the 12.5 nM concentration and explain if different concentrations of CXCL12 were tested. 

      In old Supplementary Fig. 1C, we used untreated cells, that is, CXCL12 was not present in the assay.  These experiments were performed to test the potential toxicity of DMSO (solvent) or the negative allosteric modulators on Jurkat cells. The 12.5 nM concentration of CXCL12 mentioned in the figure legend applied only to panels A and B, as indicated in the figure legend. We previously optimized this concentration for Jurkat cells using different concentrations of CXCL12 between 5 and 100 nM.  Nevertheless, we have reorganized old supplementary fig. 1 and clarified the figure legend to avoid misinterpretations (see Supplementary Fig 1A, B and Supplementary Fig. 2A, B).

      (63) Explain the observed reduction in fluorescence intensity for AGR1.135. 

      The cell cycle analysis has been moved from Supplementary Fig. 1C to a new Supplementary Fig. 2.  It now includes the flow cytometry panels to show fluorescence intensity as a function of the number of cells analyzed (Panel 1A) as well as a table (panel B) with the percentage of cells in each phase of the cell cycle. We believe that the apparent reduction in fluorescence that the reviewer observes is mainly due to the number of events analyzed. However, we have changed the flow cytometry panels for others that are more representative and included a table with the mean of the different results. When we determined the percentage of cells in each cell cycle phase, we observed that it looks very similar in all the experimental conditions. That is, none of the compounds affected any of the cell cycle phases. We have also included the effect of H2O2 and staurosporine as control compounds inducing cell death and cell cycle alteration of Jurkat cells.

      (64) Supplementary Table 1: 

      •  Include a column specifying the scoring for each compound to provide a clear reference for readers. 

      To facilitate references to readers, we have now included the inhibitory effect of each compound on Jurkat cell migration in the revised version of this table. 

      (65) Minor Points 

      Page 2 - Abstract: Rephrase the first sentence of the abstract to enhance fluidity. 

      Although the entire manuscript was revised by a professional English editor, we appreciate the valuable comments of this reviewer and we have corrected these issues accordingly.

      (66) Page 2 - Abstract: Explicitly define "CXCR4" as "C-X-C chemokine receptor type 4" the first time it appears.

      We have not used C-X-C chemokine receptor type 4 the first time it appears in the abstract. CXCR4 is an acronym normally accepted to identify this chemokine receptor, and it is used as CXCR4 in many articles published in eLife. However, we introduce the complete name the first time it appears in the introduction.

      (67) Page 2 - Abstract: Explicitly define "CXCL12" as "C-X-C motif chemokine 12" the first time it is mentioned. 

      As we have discussed in the previous response, we have not used C-X-C motif chemokine 12 the first time CXCL12 appears in the abstract, as it is a general acronym normally accepted to identify this specific chemokine, even in eLife papers. However, we introduce the complete name the first time it appears in the introduction section.

      (68) Page 2 - Abstract: Explicitly define "TMV and TMVI" upon its first mention.

      The acronym TM has been defined as “Transmembrane” in the revised version

      (69) Page 2 - Abstract: Review the use of "in silico" in the sentence for accuracy and consider revising if necessary.

      With the term “in silico” we want to refer to those experiments performed on a computer or via computer simulation software. We have carefully reviewed its use in the new version of the manuscript.

      (70) Page 2 - Abstract: Add a comma after "compound" in the sentence, "We identified AGR1.137, a small compound that abolishes...".

      A comma after “compound” has been added in the revised sentence.

      (71) Page 2 - Significance Statement: Rephrase the first sentence of the "Significance Statement" to avoid duplication with the abstract.

      The first sentence of the Significance Statement has been revised to avoid duplication with the abstract. 

      (72) Page 2 - Significance Statement: Break down the lengthy sentence, "Here, we performed in silico analyses..." for better readability. 

      The sentence starting by “Here, we performed in silico analyses…” has been broken down in the revised manuscript.

      (73) Page 2 - Introduction: Replace "Murine studies" with a more specific term for clarity.

      The term “murine studies” is normally used to refer to experimental studies developed in mice. We have nonetheless rephrased the sentence.

      (74) Page 3 - Introduction: Rephrase the sentence for clarity: "Finally, using a zebrafish model, ..."

      The sentence has been now rephrased for clarity.

      (75) Results-AGR1.135 and AGR1.137 block CXCL12-mediated CXCR4 nanoclustering and dynamics: 

      Rephrase the sentence for clarity: "Retreatment with AGR1.135 and AGR1.137, but not with AGR1.131, substantially impaired CXCL12-mediated receptor nanoclustering.”

      The sentence has been rephrased for clarity.

      (76) Results - AGR1.135 and AGR1.137 incompletely abolish CXCR4-mediated responses in Jurkat cells: Clarify the sentence: "In contrast to the effect promoted by AMD3100, a binding-site antagonist of CXCR4..."

      The sentence has been modified for clarity.

      (77) Consider using "orthosteric" instead of "binding-site" antagonist.

      The term orthosteric is now used throughout to refer to a binding site antagonist.

      (78) Discussion: Use the term "in silico" only when necessary.

      We have carefully reviewed the use of “in silico” in the manuscript.

      (79) Discussion: Clarify the sentence: "...not affect neither CXCR2-mediated cell migration...". Confirm if "CXCL12" is intended.

      The sentence refers to the chemokine receptor CXCR2, which binds the chemokine CXCL2. To test the specificity of the compounds for the CXCL12/CXCR4 axis, we evaluated CXCL2-mediated cell migration.  The results indicated that CXCL2/CXCR2 axis was not affected by the negative allosteric modulators, whereas CXCL12-mediated cell migration was blocked.  The sentence has been clarified in the new version of the manuscript.

      (80) Figure 4B: Bold the "B" in the figure label for consistency.

      The “B” in Fig. 4B has been bolded.

      Reviewer #2

      (1) Fig 2. The SPT data is sub-optimal in its presentation as well as analysis. Example images should be shown. The analysis and visualization of the data should be reconsidered for improvements. Graphs with several hundreds, in some conditions over 1000 tracks, per condition are very hard to compare. The same (randomly selected representative set) number of data points should be shown for better visualization. Also, more thorough analyses like MSD or autocorrelation functions are lacking - they would allow enhanced overall representation of the data.

      In agreement with the reviewer’s commentary, we have modified the representation of Fig. 2. We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for these type of data. We have also included as supplementary material representative videos for the TIRF-M experiments performed to allow readers to visualize the original images. Regarding the MSD analyses, they were developed to determine all D1-4 values. According to the data published by Manzo & García-Parajo (Manzo C. & García-Parajo M.F. Rep.Prog. Phys. 2015) due to the finite trajectory length the MSD curve at large tlag has poor statistics and deviates from linearity. However, the estimation of the Diffusion Coefficient (D1-4) can be obtained by fitting of the short tlag region of the MSD plot giving a more accurate idea of the behavior of particles. In agreement we show D1-4 values and not MSD data. 

      Due to the space restrictions, it is very difficult to include all the figures generated, but, only for review purposes, we included in this point-by-point reply some representative plots of the MSD values as a function of the time from individual trajectories showing different types of motion obtained in our experiments (Author response image 7).

      Author response image 7.

      Representative MSD plots from individual trajectories of CXCR4-AcGFP showing different types of motion: A) confined, B) Brownian/Free, C) direct transport of CXCR4-AcGFP particles diffusing at the cell membrane detected by SPT-TIRF in resting JKCD4 cells.

      Further analysis, such as the classification based on particle motion, has not been included in this article. This classification uses the moment scaling spectrum (MSS), described by Ewers H. et al. 2005 PNAS, and requires particles with longer trajectories (>50 frames). Only for review purposes, we include a figure showing the percentage of the MSS-based particle motion classification for each condition. As expected, most of long particles are confined, with a slight increase in the percentage upon CXCL12 stimulation in all conditions, except in cell treated with AGR1.137 (Author response image 8).

      Author response image 8.

      Effects of the negative allosteric modulators on the Types of Motion of CXCR4. Percentage of single trajectories with different types of motion, classified by MSS (DMSO: 58 particles in 59 cells on FN; 314 in 63 cells on FN+CXCL12; AGR1.131: 102 particles in 71 cells on FN; 258in 69 cells on FN+CXCL12; AGR1.135: 86 particles in 70 cells on FN; 120 in 77 cells on FN+CXCL12; AGR1.137: 47 particles in 66 cells on FN; 74 in 64 cells on FN+CXCL12) n = 3.

      (2) Fig 3. The figure legends have inadequate information on concentrations and incubation times used, both for the compounds and other treatments like CXCL12 and forskolin. For the Western blot data, also the quantification should be added to the main figure. The compounds, particularly AGR1.137 seem to lead to augmented stimulation of pAKT and pERK. This should be discussed

      The Fig. 3 legend has been corrected in the revised manuscript. Fig. 3D now contains representative western blots and the densitometry evaluation of these experiments. As the reviewer indicates, we also detected in the western blot included, augmented stimulation of pAKT and pERK in cells treated with AGR1.137. However, as shown in the densitometry analysis, no significant differences were noted between the data obtained with each compound. As a control of inhibition of CXCL12 stimulation we have included a new Supplementary Fig. 4 showing the effect of AMD3100 on CXCL12-induced activation of Akt and ERK as analyzed by western blot.

      (3) Fig. 4 immunofluorescence data on polarization as well as the flow chamber data lack the representative images of the data. The information on the source of the T cells is missing. Not clear if this experiment was done on bilayers or on static surfaces.

      Representative images for the data shown in Figure 4B have been added in the revised figure (Fig. 4C). The experiments in Fig. 4B were performed on static surfaces. As indicated in the material and methods section, primary T cell blasts were added to fibronectin-coated glass slides and then were stimulated or not with CXCL12 (5 min at 37ºC) prior to fix permeabilize and stain them with Phalloidin. Primary T cell blasts were generated from PBMCs isolated from buffy coats that were activated in vitro with IL-2 and PHA as indicated in the material and methods section.

      (4) The data largely lacks titration of different concentrations of the compounds. How were the effective concentration and treatment times determined? What happens at higher concentrations? It is important to show, for instance, if the CXCR12 binding gets inhibited at higher concentrations. most experiments were performed with 50 uM, but HeLa cell data with 100 uM. Why and how was this determined? 

      The revised version contains a new panel in Fig. 1B to show a more detailed kinetic analysis with different concentrations (1-100 µM) of the compounds in the migration experiments using Jurkat cells. We choose 50 µM for further studies as it was the concentration that inhibits 50-75% of the ligand induced cell migration. 

      We have also included the effect of two doses of the compounds (10 and 50 µM) in the zebrafish model as well as AMD3100 (1 and 10 µM) as control (new Fig. 7D, E).  Tumors were imaged within 2 hours of implantation and tumor-baring embryos were treated with either vehicle (DMSO) alone, AGR1.131 or AGR1.137 at 10 and 50 µM or AMD3100 at 1 and 10 µM for three days, followed by re-imaging.

      Regarding the amount of CXCL12 used in these experiments, with the exception of cell migration assays in Transwells, where the optimal concentration was established at 12.5 nM, in all the other experiments the optimal concentration of CXCL12 employed was 50 nM. In the case of the directional cell migration assays, we use 100 nM to create the chemokine gradient in the device. These concentrations have been optimized in previous works of our laboratory using these types of experiments. It should also be noted that in the experiments using lipid bilayers or TIRF-M experiments, CXCL12 is used to coat the plates and therefore it is difficult to determine the real concentration that is retained in the surface after the washing steps performed prior adding the cells.

      (5) The authors state that they could not detect direct binding of the compounds and the CXCR14. It should be reported what approaches were tried and discussed why this was not possible. 

      We attempted a fluorescence spectroscopy strategy to formally prove the ability of AGR1.135 to bind CXCR4, but this strategy failed because the compound has a yellow color that interfered with the determinations. We also tried a FRET strategy (see supplementary Fig. 7) and detected a significant increase in FRET efficiency of CXCR4 homodimers in cells treated with AGR1.135; this effect was due to the yellow color of this compound that interferes with FRET determinations. In the same assays, AGR1.137 did not modify FRET efficiency for CXCR4 homodimers and therefore we cannot assume that AGR1.137 binds on CXCR4. All these data have been considered in the revised discussion.

      (6) The proliferation data in Supplementary Figure 1 lacks controls that affect proliferation and indication of different cell cycle stages. What is the conclusion of this data? More information on the effects of the drug to cell viability would be important.

      Toxicity in Jurkat cells was first determined by propidium iodide incorporation. Some compounds (i.e., AGR1.103 and VSP3.1) were discarded from further analysis as they were toxic for cells. In a deeper analysis of cell toxicity, even if these compounds did not kill the cells, we checked whether they could alter the cell cycle of the cells. New Supplementary Fig. 2 includes a table (panel B) with the percentage of cells in each cell cycle phase, and no differences between any of the treatments tested were detected. 

      Nevertheless, to clarify this issue the revised version of the figure also includes H2O2 and staurosporine stimuli to induce cell death and cell cycle alterations as controls of these assays.

      (7) The flow data in Supplementary Figure 2 should be statistically analysed. 

      Bar graphs corresponding to the old Supplementary Fig. 2 (new Supplementary Fig. 3) are shown in Fig. 3B. We have also incorporated the corresponding statistical analysis to this figure. 

      (8) In general, the authors should revise the figure legends to ensure that critical details are added. 

      We have carefully revised all the figure legends in the new version of the manuscript.

      (9) Bar plots are very poor in showing the heterogeneity of the data. Individual data points should be shown whenever feasible. Superplot-type of representation is strongly advised (https://doi.org/10.1083/jcb.202001064).

      We have carefully read the paper published by Lord S.J. and col. (Lord S. J. et al., J. Cell Biol. 2020) and we apply their recommendations for our TIRF-M data (see revised Fig.  2).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work provides a near-complete description of the mechanosensory bristles on the Drosophila melanogaster head and the anatomy and projection patterns of the bristle mechanosensory neurons that innervate them. The data presented are solid. The study has generated numerous invaluable resources for the community that will be of interest to neuroscientists in the field of circuits and behaviour, particularly those interested in mechanosensation and behavioural sequence generation.

      We express our gratitude to the Reviewers for their valuable suggestions, which significantly enhanced the manuscript. The revisions were undertaken, not with the expectation of acceptance, but rather driven by our sincere belief that these revisions would enhance the manuscript's impact for future readers.

      Public Reviews:

      Reviewer #1 (Public Review):

      Sensory neurons of the mechanosensory bristles on the head of the fly project to the sub esophageal ganglion (SEZ). In this manuscript, the authors have built on a large body of previous work to comprehensively classify and quantify the head bristles. They broadly identify the nerves that various bristles use to project to the SEZ and describe their region-specific innervation in the SEZ. They use dye-fills, clonal labelling, and electron microscopic reconstructions to describe in detail the phenomenon of somatotopy - conserved peripheral representations within the central brain - within the innervation of these neurons. In the process they develop novel tools to access subsets of these neurons. They use these to demostrate that groups of bristles in different parts of the head control different aspects of the grooming sequence.

      Reviewer #2 (Public Review):

      The authors combine genetic tools, dye fills and connectome analysis techniques to generate a "first-of-its-kind", near complete, synaptic resolution map of the head bristle neurons of Drosophila. While some of the BMN anatomy was already known based on previous work by the authors and other researchers, this is the first time a near complete map has been created for the head BMNs at electron microscopy resolution.

      Strengths:

      (1) The authors cleverly use techniques that allow moving back and forth between periphery (head bristle location) and brain, as well as moving between light microscopy and electron microscopy data. This allows them to first characterize the pathways taken by different head BMNs to project to the brain and also characterize anatomical differences among individual neurons at the level of morphology and connectivity.

      (2) The work is very comprehensive and results in a near complete map of all I’m head BMNs.

      (3) Authors also complement this anatomical characterization with a first-level functional analysis using optogenetic activation of BMNs that results in expected directed grooming behavior.

      Weaknesses:

      (1) The clustering analysis is compelling but cluster numbers seem to be arbitrarily chosen instead of by using some informed metrics.

      We made revisions to the manuscript that address this concern. Please see our response to “recommendations for authors” for a description of these revisions.

      (2) It could help provide context if authors revealed some of the important downstream pathways that could explain optogenetics behavioral phenotypes and previously shown hierarchical organization of grooming sequences.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      (3) In contrast to the rigorous quantitative analysis of the anatomical data, the behavioral data is analyzed using much more subjective methods. While I do not think it is necessary to perform a rigorous analysis of behaviors in this anatomy focused manuscript, the conclusions based on behavioral analysis should be treated as speculative in the current form e.g. calling "nodding + backward walking" as an avoidance response is not justified as it currently stands. Strong optogenetic activation could lead to sudden postural changes that due to purely biomechanical constraints could lead to a couple of backward steps as seen in the example videos. Moreover since the quantification is manual, it is not clear what the analyst interprets as backward walking or nodding. Interpretation is also concerning because controls show backward walking (although in fewer instances based on subjective quantification).

      While unbiased machine vision-based methods would nicely complement the present work, this type of analysis is not yet working to distinguish between different head grooming movements. Therefore, we are currently limited to manual annotation for our behavioral analysis. That said, we do not believe that our manual annotation is subjective. The grooming movements that we examine in this work are distinguishable from each other through frame-by-frame manual annotation of video at 30 fps. Our annotation of the grooming and backward motions performed by flies are based on previous publications that established a controlled vocabulary defining each movement (Hampel et al., 2020a, 2017, 2015; Seeds et al., 2014). In this work, we added head nodding to this controlled vocabulary that is described in the Materials and methods. We have added additional text to the third paragraph of the Material and methods section entitled “Behavioral analysis procedures” that we hope better describes our behavioral analysis. This description now reads:

      Head nodding was annotated when the fly tilted its head downward by any amount until it returned its head back in its original position. This movement often occurred in repeated cycles. Therefore, the “start” was scored at the onset of the first forward movement and the “stop” when the head returned to its original position on the last nod.

      We do not make any firm conclusions about the head movements (nodding) and backwards motions. We refer to nodding as a descriptive term that would allow the reader to better understand what the behavior looks like. We make no firm conclusions about any behavioral functional role that either the nodding or the backward motions might have, with the exception of nodding in the context of grooming. We only suggest that the behaviors appear to be avoidance responses. Furthermore, backward walking was not mentioned. Instead we refer to backward motions. We are only reporting our annotations of these movements that do occur, and are significantly different from controls. We speculate that these could be avoidance responses based on support from the literature. Future studies will be required to understand whether these movements serve real behavioral roles.

      Summary:

      The authors end up generating a near-complete map of head BMNs that will serve as a long-standing resource to the Drosophila research community. This will directly shape future experiments aimed at modeling or functionally analyzing the head grooming circuit to understand how somatotopy guides behaviors.

      Reviewer #3 (Public Review):

      Eichler et al. set out to map the locations of the mechanosensory bristles on the fly head, examine the axonal morphology of the bristle mechanosensory neurons (BMNs) that innervate them, and match these to electron microscopy reconstructions of the same BMNs in a previously published EM volume of the female adult fly brain. They used BMN synaptic connectivity information to create clusters of BMNs that they show occupy different regions of the subesophageal zone brain region and use optogenetic activation of subsets of BMNs to support the claim that the morphological projections and connectivity of defined groups of BMNs are consistent with the parallel model for behavioral sequence generation.

      The authors have beautifully cataloged the mechanosensory bristles and the projection paths and patterns of the corresponding BMN axons in the brain using detailed and painstaking methods. The result is a neuroanatomy resource that will be an important community resource. To match BMNs reconstructed in an electron microscopy volume of the adult fly brain, the authors matched clustered reconstructed BMNs with light-level BMN classes using a variety of methods, but evidence for matching is only summarized and not demonstrated in a way that allows the reader to evaluate the strength of the evidence. The authors then switch from morphology-based categorization to non-BMN connectivity as a clustering method, which they claim demonstrates that BMNs form a somatotopic map in the brain. This map is not easily appreciated, and although contralateral projections in some populations are clear, the distinct projection zones that are mentioned by the authors are not readily apparent. Because of the extensive morphological overlap between connectivity-based clusters, it is not clear that small projection differences at the projection level are what determines the post-synaptic connectivity of a given BMN cluster or their functional role during behavior. The claim the somatotopic organization of BMN projections is preserved among their postsynaptic partners to form parallel sensory pathways is not supported by the result that different connectivity clusters still have high cosine similarity in a number of cases (i.e. Clusters 1 and 3, or Clusters 1 and 2). Finally, the authors use tools that were generated during the light-level characterization of BMN projections to show that specifically activating BMNs that innervate different areas of the head triggers different grooming behaviors. In one case, activation of a single population of sensory bristles (lnOm) triggers two different behaviors, both eye and dorsal head grooming. This result does not seem consistent with the parallel model, which suggests that these behaviors should be mutually exclusive and rely on parallel downstream circuitry.

      We made revisions to the manuscript that address this recommendation. Please see our response to “recommendations for authors” for a description of these revisions.

      This work will have a positive impact on the field by contributing a complete accounting of the mechanosensory bristles of the fruit fly head, describing the brain projection patterns of the BMNs that innervate them, and linking them to BMN sensory projections in an electron microscopy volume of the adult fly brain. It will also have a positive impact on the field by providing genetic tools to help functionally subdivide the contributions of different BMN populations to circuit computations and behavior. This contribution will pave the way for further mechanistic study of central circuits that subserve grooming circuits.

      Recommendations for the authors:

      All three reviewers appreciated the work presented in this manuscript. There were also a few overlapping concerns that were raised that are summarised below, should the authors wish to address them:

      Somatotopy: We recommend that the authors describe the extent of prior knowledge in more detail to highlight their contribution better.

      We made revisions that better highlight the extent of prior knowledge about somatotopy. We describe how previous studies showed bristle mechanosensory neurons in insects are somatotopically organized, but these studies were not comprehensive descriptions of complete somatotopic maps for the head or body. To our knowledge, our study provides the first comprehensive and synaptic resolution somatotopic map of a head for any animal. This sets the stage for the complete definition of the interface between somatotopically-organized mechanosensory neurons and postsynaptic circuits, which has broad implications for future studies on aimed grooming, and mechanosensation in general. Below we itemize revisions to the Introduction, Discussion, and Figures to provide a clearer statement of the significance of our study as it relates to somatotopy.

      (1) Newly added Figure 1 – figure supplement 1 more explicitly grounds the study in somatotopy, providing a working model of the organization of the circuit pathways that produce the grooming sequence. This model features somatotopy as shown in Figure 1 – figure supplement 1C.

      (2) Figure 1 – figure supplement 1 is incorporated into the Introduction in the second, third, and fourth paragraphs, the first paragraph of the Results section titled “Somatotopically-organized parallel BMN pathways”, and the second and third paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      (3) We added text to the end of the fourth paragraph of the Introduction that now reads: “In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.”

      (4) There is a Discussion section that further explains the extent of prior knowledge and our contributions on somatotopy that is titled “A synaptic resolution somatotopic map of the head BMNs”. Additionally, the previous version of this section had a paragraph on the broader implications of our work as it relates to somatotopy across species. In light of the reviewer comments, we decided to make this paragraph into its own Discussion section to better highlight the broader significance of our work. This section is titled “First synaptic resolution somatotopic map of the head”.

      The somatotopy isn't overtly obvious - perhaps they could try mapping presynaptic sites and provide landmarks to improve visualisation.

      We made the following revisions to better highlight the head BMN somatotopy. One point of confusion from the previous manuscript version stemmed from us not explicitly defining the somatotopic organization that we observed. There seemed to be confusion that we were defining the head somatotopy based only on the small projection differences among BMNs from neighboring head locations. While we believe that these small differences indeed correspond to somatotopy, we failed to highlight that there are overt differences in the brain projections of BMNs from distant locations on the head. For example, Figure 5B (right panel) shows the distinct projections between the LabNv (brown) and AntNv (blue) BMNs that innervate bristles on the ventral and dorsal head, respectively. Thus, BMN types innervating neighboring bristles show overlapping projections with small projection differences, whereas those innervating distant bristles show non overlapping projections into distinct zones.

      Our analysis of postsynaptic connectivity similarity also shows somatotopic organization among the BMN postsynaptic partners, as BMN types innervating the same or neighboring bristle populations show high connectivity similarity (Figure 8, old Figure 7). Below we highlight major revisions to the text and Figures that hopefully better reveal the head somatotopy.

      (1) In the last paragraph of the Introduction we added text that explicitly frames the experiments in terms of somatotopic organization: “This reveals somatotopic organization, where BMNs innervating neighboring bristles project to the same zones in the CNS while those innervating distant bristles project to distinct zones. Analysis of the BMN postsynaptic connectome reveals that neighboring BMNs show higher connectivity similarity than distant BMNs, providing evidence of somatotopically organized postsynaptic circuit pathways.”

      (2) We mention an example of overt somatotopy from Figure 5 in the Results section titled “EM-based reconstruction of the head BMN projections in a full adult brain”. The text reads “For example, BMNs from the Eye- and LabNv have distinct ventral and anterior projections, respectively. This shows how the BMNs are somatotopically organized, as their distinct projections correspond to different bristle locations on the head (Figure 5B,C).”

      (3) In new Figure 8 (part of old Figure 7), we modified panels that correspond to the cosine similarity analysis of postsynaptic connectivity. The major revision was to plot the cosine similarity clusters onto the head bristles so that the bristles are now colored based on their clusters (C). This shows how neighboring BMNs cluster together, and therefore show similar postsynaptic connectivity. We believe that this provides a nice visualization of somatotopic organization in BMN postsynaptic connectivity. We also added the clustering dendrogram as recommended by Reviewer #2 (Figure 8A).

      (4) In new Figure 8, we added new panels (D-F) that summarize our anatomical and connectomic analysis showing different somatotopic features of the head BMNs. Different BMN types innervate bristles at neighboring and distant proximities (D). BMNs that innervate neighboring bristles project into overlapping zones (E, example of reconstructed BM-Fr and -Ant neurons with non-overlapping BM-MaPa neurons) and show postsynaptic connectivity similarity (F, example connectivity map of three BM types on cosine similarity data).

      (5) To accompany the new Figure 8D-F panels, we added a paragraph to summarize the different somatotopic features of the head BMNs that were identified based on our anatomical and connectomic analysis. This is the last paragraph in the Results section titled “Somatotopically-organized parallel BMN pathways”:

      Our results reveal head bristle proximity-based organization among the BMN projections and their postsynaptic partners to form parallel mechanosensory pathways. BMNs innervating neighboring bristles project into overlapping zones in the SEZ, whereas those innervating distant bristles project to distinct zones (example of BM-Fr, -Ant, and -MaPa neurons shown in Figure 8D,E). Cosine similarity analysis of BMN postsynaptic connectivity revealed that BMNs innervating the same bristle populations (same types) have the highest connectivity similarity. Figure 8F shows example parallel connections for BM-Fr, -Ant, and -MaPa neurons (vertical arrows), where the edge width indicates the number of synapses from each BMN type to their major postsynaptic partners. Additionally, BMNs innervating neighboring bristle populations showed postsynaptic connectivity similarity, while BMNs innervating distant bristles show little or none. For example, BM-Fr and -Ant neurons have connections to common postsynaptic partners, whereas BM-MaPa neurons show only weak connections with the main postsynaptic partners of BM-Fr or -Ant neurons (Figure 8F, connections under 5% of total BMN output omitted). These results suggest that BMN somatotopy could have different possible levels of head spatial resolution, from specific bristle populations (e.g. Ant bristles), to general head areas (e.g. dorsal head bristles).

      We also refer to Figure 8D-F to illustrate the different somatotopic features in the Discussion. These references can be found in the following Discussion sections titled “A synaptic resolution somatotopic map of the head BMNs (fourth paragraph)”, and “Parallel circuit architecture underlying the grooming sequence (second paragraph)”.

      (6) In addition to improving the Figures, we provide additional tools that enable readers to explore the BMN somatotopy in a more interactive way. That is, we provide 5 different FlyWire.ai links in the manuscript Results section that enable 3D visualization of the different reconstructed BMNs (e.g. FlyWire.ai link 1).

      Note: In working on old Figure 7 to address this Reviewer suggestion, we also reordered panels A-E. We believe that this was a more logical ordering than in the previous draft. These panels are now the only data shown in Figure 7, as the cosine similarity analysis is now in Figure 8. We hope that splitting these panels into two Figures will improve manuscript readability.

      Light EM Mapping: A better description of methods by which this mapping was done would be helpful. Perhaps the authors could provide a few example parallel representations of the EM and light images in the main figure would help the reader better appreciate the strength of their approach.

      We have done as the Reviewers suggested and added panels to Figure 6 that show examples of the LM and EM image matching (Figure 6A,B). We added two examples that used different methods for labeling the LM imaged BMNs, including MCFO labeling of an individual BM-InOc neuron and driver line labeling of a major portion of BM-InOm neurons using InOmBMN-LexA. These panels are referred to in the first paragraph of the Results section titled “Matching the reconstructed head BMNs with their bristles”. Note that examples for all LM/EM matched BMN types are shown in Figure 6 – figure supplement 2.

      We had provided Figure 6 – figure supplement 2 in the reviewed manuscript that shows all the above requested “parallel representations of the EM and light images”. However, the Reviewer critiques made us realize that the purpose of this figure supplement was not clearly indicated. Therefore, we have revised Figure 6 – figure supplement 2 and its legend to make its purpose clearer. First, we changed the legend title to better highlight its purpose. The legend is now titled: “Matching EM reconstructed BMN projections with light microscopy (LM) imaged BMNs that innervate specific bristles”. Second, we added label designations to the figure panel rows that highlight the LM and EM comparisons. That is, the rows for light microscopy images of BMNs are indicated with LM and the rows for EM reconstructed BMN images are labeled with EM. Reviewer #3 had indicated that it was not clear what labeling methods were used to visualize the LM imaged BM-InOm neurons in Figure 6 – figure supplement 2N. Therefore, we added text to the figure and the legend to better highlight the different methods used. Panels A and B were also cropped to accommodate the above mentioned revisions.

      The manuscript also provides an extensive Materials and methods section that describes the different lines of evidence that were used to assign the reconstructed BMNs as specific types. We changed the title to better highlight the purpose of this methods section to “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. The evidence used to support the assignment of the different BMN types is also summarized in Figure 6 – figure supplement 3.

      Parallel circuit model: The authors motivate their study with this. We're recommending that they define expectations of such circuitry, its alternatives (including implications for downstream pathways), and behavior before they present their results. We're also recommending that they interpret their behavioural results in the context of these circuits.

      Our primary motivation for doing the experiments described in this manuscript was to help define the neural circuit architecture underlying the parallel model that drives the Drosophila grooming sequence. This manuscript provides a comprehensive assessment of the first layer of this circuit architecture. A byproduct of this work is a contribution that offers immediate utility and significance to the Drosophila connectomics community. Namely, the description of the majority of mechanosensory neurons on the head, with their annotation in the recently released whole brain connectome dataset (FlyWire.ai). In writing this manuscript, we tried to balance both of these things, which was difficult to write. We very much appreciate the Reviewers' comments that have highlighted points of confusion in our original draft. We hope that the revised draft is now clearer and more logically presented. We have made revisions to the text and provided a new figure supplement (Figure 1 - figure supplement 1) and new panels in Figure 8. Below we highlight the major revisions.

      (1) The Introduction was revised to more explicitly ground the study in the parallel model, while also removing details that were not pertinent to the experiments presented in the manuscript.

      The first paragraph introduces different features of the parallel model. To better focus the reader on the parts of the model that were being assessed in the manuscript, we removed the following sentences: “Performance order is established by an activity gradient among parallel circuits where earlier actions have the highest activity and later actions have the lowest. A winner-take-all network selects the action with the highest activity and suppresses the others. The selected action is performed and then terminated to allow a new round of competition and selection of the next action.” Note that these sentences are included in the third and fourth paragraphs of the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence”.

      The first paragraph of the Introduction now introduces a bigger picture view of the model that emphasizes the two main features: 1) a parallel circuit architecture that ensures all mutually exclusive actions to be performed in sequence are simultaneously readied and competing for output, and 2) hierarchical suppression among the parallel circuits, where earlier actions suppress later actions.

      (2) Newly added Figure 1 – figure supplement 1 provides a working model of grooming (Reviewer # 1 suggestion). We now more strongly emphasize that the study aimed to define the parallel neural circuit architecture underlying the grooming sequence, focusing on the mechanosensory layer of this architecture. In particular, we refer to the new Figure 1 – figure supplement 1 that has been added to better convey the hypothesized grooming neural circuit architecture. Figure 1 – figure supplement 1 is incorporated into the Introduction (paragraphs two, three, and four), Results section titled “Somatotopically-organized parallel BMN pathways (first paragraph)”, and last Discussion section titled “Parallel circuit architecture underlying the grooming sequence (second and third paragraphs)”.

      (3) New panels in Figure 8 update the model of parallel circuit organization as it relates to somatotopy (D-F). These panels show the parallel circuits hypothesized by the model, but also indicate convergence, with different possible levels of head resolution for these circuits. We describe above where these panels are referenced in the text.

      (4) We added a new paragraph in the last Discussion section titled “Parallel circuit architecture underlying the grooming sequence” that better incorporates the results from this manuscript into the working model of grooming. This paragraph is shown below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      Aside from this summary of major concerns, the detailed recommendations are attached below.

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the quality and exhaustive body of work presented in this manuscript. I have a few comments that the authors may want to consider:

      (1) The authors motivate this study by posing that it would allow them to uncover whether the complex grooming behaviour of flies followed a parallel model of circuit function. It would have been nice to have been introduced to what the alternative model might be and what each would mean for organisation of the circuit architecture. Some guiding schematics would go a long way in illustrating this point. Modifying the discussion along these lines would also be helpful.

      We made several revisions to the manuscript that address this recommendation. Among these revisions, we added Figure 1 – figure supplement 1 that includes a working model for grooming. Please see above for a description of these revisions.

      (2) The authors mention the body of work that has mapped head bristles and described somatotopy. It would be useful to discuss in more detail what these studies have shown and highlight where the gaps are that their study fills.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (3) The dye-fills and reconstructions that are single colour could use a boundary to demarcate the SEZ. This would help in orienting the reader.

      We agree with Reviewer #1 that Figure 4 and its supplements could use some indicator that would orient the reader with respect to the dye filled or stochastically labeled neurons. The images are of the entire SEZ in the ventral brain, and in the case of some panels, the background staining enables visualization of the brain (e.g. Figure 4H,M,N. To help orient the reader in this region, we added a dotted line to indicate the approximate SEZ midline. This also enables the reader to more clearly see which of the BMN types cross the midline.

      Midline visual guides were added for Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      (4) The comparison between the EM and the fills/clones are not obvious. And particularly because they are not directly determined, it would be nice to have the EM reconstruction alongside the dye-fills. This would work very nicely in the supplementary figure with the multiple fills of the same bristles. I think this would really drive home the point.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      (5) Are there unnoticed black error-bars floating around in many of the gray-scale images?

      The black bars were masking white scale bars in the images. We have removed the black bars and remade the images without scale bars. This was done for the following Figures: Figure 4, Figure 4 – figure supplement 2, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4, Figure 4 – figure supplement 5, Figure 4 – figure supplement 6, Figure 4 – figure supplement 7, Figure 4 – figure supplement 8, Figure 6 – figure supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      (1) The only point in the paper I found myself going back and forth between methods/supp and text was when authors discuss about the clustering. I think it would help the reader if a few sentences about cosine clustering used for connectivity based clustering were included in the main text. Also, for NBLAST hierarchical clustering, it would help if some informed metrics could be used for defining cluster numbers (e.g. Braun et al, 2010 PLOS ONE shows how Ward linkage cost could be used for hierarchical clustering).

      Depending on where the cut height is placed on the dendrogram for cosine similarity of BMNs, different features of the BMN type postsynaptic connectivity are captured. As the number of clusters is increased (lower cut height), clustering is mainly among BMNs of the same type, showing that these BMNs have the highest connectivity similarity. As the number of clusters is reduced (higher cut height), BMNs innervating neighboring bristles on the head are clustered, revealing three general clusters corresponding to the dorsal, ventral, and posterior head. This reveals somatotopy based clustering among same and neighboring BMN types. The cut height shown in Figure 8 and Figure 8 – figure supplement 2 was chosen because it highlighted both of these features.

      The NBLAST clustering shows similar results to the connectivity based clustering with respect to neighboring and distant BMN types. As the number of clusters increases BMNs of the same type are clustered, and these types can be further subdivided into morphologically distinct subtypes. As the number of clusters is reduced, the clustering captures neighboring BMNs. Thus, neighboring BMN types showed high morphology similarity (and proximity) with each other, and low similarity with distant BMN types.

      Please see our responses to a Reviewer #3 critique below for further description of the clustering results.

      On the same lines it would help if the clustering dendrograms were included in the main figure.

      We thank Reviewer #2 for this comment. We have added the dendrogram to Figure 8A, a change that we feel makes this Figure much easier to understand.

      (2) It could help provide intuition if the authors revealed some of the downstream targets and their implication in explaining the behavioral phenotypes.

      While this will be the subject of at least two forthcoming manuscripts, we have added text to the present manuscript that provides insight into BMN postsynaptic targets. Our previous work (Hampel et al. 2015) described a mechanosensory connected neural circuit that elicits grooming of the antennae. While this previous study demonstrated that the Johnston’s organ mechanosensory neurons are synaptically and functionally connected with this circuit, our preliminary analysis indicates that it is also connected with BM-Ant neurons. We hypothesize that there are additional such circuits that are responsible for eliciting grooming of other head locations.

      To better highlight potential downstream targets in the manuscript, we now mention the antennal circuit in the Introduction. This text reads: In this model, parallel-projecting mechanosensory neurons that respond to stimuli at specific locations on the head or body could connect with somatotopically-organized parallel circuits that elicit grooming of those locations (Figure 1 – figure supplement 1A-C). The previous discovery of a mechanosensory-connected circuit that elicits aimed grooming of the antennae provides evidence of this organization (Hampel 2015). However, the extent to which distinct circuits elicit grooming of other locations is unknown, in part, because the somatotopic projections of the mechanosensory neurons have not been comprehensively defined for the head or body.

      There is also text in the Discussion that addresses this Reviewer comment. It describes the antennal circuit and mentions the possibility that other similar circuits may exist. This can be found in the third paragraph of the section titled “Circuits that elicit aimed grooming of specific head locations”.

      (3) Authors find that opto activation of BMNs leads to grooming of targeted as well as neighboring areas. Is there any sequence observed here? i.e. first clean targeted area and then clean neighboring area? I wonder if the answer to this is something as simple as common post-synaptic targets which is essentially reducing the resolution of the BMN sensory map. Some more speculation on this interesting result could be helpful.

      We appreciate and agree with this point from Reviewer #2, and have tried to better emphasize the possible implications for grooming that the overlapping projections and connectivity among BMNs innervating neighboring bristles may have. This is now better addressed in the Results and Discussion sections. Below we highlight where this is addressed:

      (1) In the second paragraph of the Results section titled “Activation of subsets of head BMNs elicits aimed grooming of specific locations” we added text that suggests the possibility that grooming of the stimulated and neighboring locations could be due to the overlapping projections and connectivity. This text reads: This suggested that head BMNs elicit aimed grooming of their corresponding bristle locations, but also neighboring locations. This result is consistent with our anatomical and connectomic data indicating that BMNs innervating neighboring bristles show overlapping projections and postsynaptic connectivity similarity (see Discussion).

      (2) In the fourth paragraph of the Discussion section titled “A synaptic resolution somatotopic map of the head BMNs”, we added a sentence to the end of the fourth paragraph that alludes to further discussion of this topic. This sentence reads: This overlap may have implications for aimed grooming behavior. For example, neighboring BMNs could connect with common neural circuits to elicit grooming of overlapping locations (discussed more below).

      (3) In the fourth paragraph of the Discussion section titled “Circuits that elicit aimed grooming of specific head locations” there is a paragraph that mentions the possibility of mechanosensory convergence onto common postsynaptic circuits to promote grooming of the stimulated area, along with neighboring areas. This paragraph is below.

      We find that activation of specific BMN types elicits both aimed grooming of their corresponding bristle locations and neighboring locations. This suggests overlap in the locations that are groomed with the activation of different BMN types. Such overlap provides a means of cleaning the area surrounding the stimulus location. Interestingly, our NBLAST and cosine similarity analysis indicates that neighboring BMNs project into overlapping zones in the SEZ and show common postsynaptic connectivity. Thus, we hypothesize that neighboring BMNs connect with common neural circuits (e.g. antennal grooming circuit) to elicit overlapping aimed grooming of common head locations.

      (4) In the new second paragraph of the Discussion section titled “Parallel circuit architecture underlying the grooming sequence” we further discuss the issue of the BMN “sensory map. This paragraph is below.

      Here we define the parallel architecture of BMN types that elicit the head grooming sequence that starts with the eyes and proceeds to other locations, such as the antennae and ventral head. The different BMN types are hypothesized to connect with parallel circuits that elicit grooming of specific locations (described above and shown in Figure 1 – figure supplement 1A,C). Indeed, we identify distinct projections and connectivity among BMNs innervating distant bristles on the head, providing evidence supporting this parallel architecture (Figure 8D-F). However, we also find partially overlapping projections and connectivity among BMNs innervating neighboring bristles. Further, optogenetic activation of BMNs at specific head locations elicits grooming of both those locations and neighboring locations (Figure 9). These findings raise questions about the resolution of the parallel architecture underlying grooming. Are BMN types connected with distinct postsynaptic circuits that elicit aimed grooming of their corresponding bristle populations (e.g. Ant bristles)? Or are neighboring BMN types that innervate bristles in particular head areas connected with circuits that elicit grooming of those areas (e.g. dorsal or ventral head)? Future studies of the BMN postsynaptic circuits will be required to define the resolution of the parallel pathways that elicit aimed grooming.

      (4) If authors were to include a summary table that shows all known attributes about BMN type as columns that could be very useful as a resource to the community. Table columns could include attributes like "bristle name", "nerve tract", "FlyWire IDs of all segments corresponding to the bristle class". "split-Gal4 line or known enhancer" , etc.

      We provided a table that includes much of this information after the manuscript had already gone out for review. We regret that this was not available. This is now provided as Supplementary file 3. This table provides the following information for each reconstructed BMN: BMN name, bristle type, nerve, flywire ID, flywire coordinates, NBLAST cluster (cut height 1), NBLAST cluster (cut height 5), and cosine cluster (cut height 4.5). Note that the driver line enhancers for targeting specific BMN types are shown in Figure 3I.

      Specific Points:

      Figure 4C-V:

      • I find it a bit difficult to distinguish ipsi- from contra-lateral projections. Maybe indicate the midline as a thin, stippled line?

      We thank the Reviewer #2 for this suggestion. We have now added lines in the panels in Figure 4C-V to indicate the approximate location of the midline. We also added lines to the Figure 4 – figure supplements as described above.

      I think this Fig reference is wrong "the red-light stimulus also elicited backward motions with control flies (Figure 6B,C, control, black trace, Video 5)." should be Fig 8B,C

      We have fixed this error.

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      Motivating this study in terms of understanding the neural mechanisms that execute the parallel model seems to overstate what you will achieve with the current study. If you want to motivate it this way, I suggest focusing on the grooming sequence of the head along (eyes, antennae, proboscis).

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that many of the revisions focus on the head grooming sequence. We also made minor revisions to the Introduction that further emphasize the focus on head grooming.

      Results:

      Figure 1. Please indicate that this is a male fly in either the figure title or in the figure itself.

      We added a male symbol to Figure 1A.

      Figure 3. Panel J is referenced in the main body text and in the figure caption, but there is no Fig 3J.

      Panel J is shown in the upper right corner of Figure 3. We realize that the placement of this panel is not ideal, but this was the only place that we could fit it. Additionally, the panel works nicely at that location to better enable comparison with panel C. We have revised the text in the Figure 3 legend to better highlight the location of this Figure panel: “Shown in the upper right corner of the figure are the aligned expression patterns of InOmBMN-LexA (red), dBMN-spGAL4 (green), and TasteBMN-spGAL4 (brown).”

      We also added text to a sentence in the results section entitled “Head BMNs project into discrete zones in the ventral brain” that indicates the panel location. This text reads: To further visualize the spatial relationships between these projections, we computationally aligned the expression patterns of the different driver lines into the same brain space (Figure 3J, upper right corner).

      Matching the BMNs to EM reconstructions: why cut the dendrogram at H=5? Would be better to determine cluster number using an unbiased method.

      To match the morphologically distinct EM reconstructed BMNs to their specific bristles, we relied on different lines of evidence, including NBLAST results (discussed more below), dye fill/stochastic labeling/driver line labeling matches, published morphology, nerve projection, bristle number, proximity to other BMNs, and postsynaptic connectivity (summarized in Figure 6 – figure supplement 3). The following Materials and methods section provides a detailed description of the evidence used to assign each BMN type in “Matching EM reconstructed BMN projections with light microscopy imaged BMNs that innervate specific bristles”. In many cases, BMN type could be assigned with confidence solely based on morphological comparisons with our light level data (e.g. dye fills), in conjunction with bristle counts to indicate an expected number of BMNs showing similar morphology. Thus, the LM/EM matches and NBLAST clustering were largely complementary.

      The EM reconstructed BMNs were matched as particular BMN types, in part based on examination of the NBLAST data at different cut heights. NBLAST clustering of the BMNs revealed general trends at higher and lower cut heights (Figure 6 – figure supplement 1A, Supplementary file 3). The lowest cut heights included mostly BMNs of the same type innervating the same bristle populations, and smaller clusters that subdivided into morphologically distinct subtypes (see Supplementary file 3 for clusters produced at cut height 1). This revealed that BMNs of the same type tended to show the highest morphological similarity with each other, but they also showed intratype morphological diversity. Higher cut heights produced clusters of BMNs innervating neighboring bristles populations (e.g. ventral head BMNs), showing high morphological similarity among neighboring BMN types.

      We selected the cut height 5 shown in Figure 6 – figure supplement 1A,B because it captures examples of both same and neighboring type clustering. For example, it captures a cluster of mostly BM-Taste neurons (Cluster 16), and neighboring BMN types, including those from the dorsal head (Cluster 14) or ventral head (Cluster 15).

      Based on reviewer comments, we realized that the way we wrote the BMN matching section in the Results indicated more reliance on the NBLAST clustering than what was actually necessary, distorting the way we actually matched the BMNs. Therefore, we softend the first couple of sentences to place less emphasis on the importance of the NBLAST. We also indicated that the readers can find the resulting clusters at different cut heights, referring to Figure 6 – figure supplement 1A and Supplementary file 3. The first two sentences of the first paragraph in the Results section titled “Matching the reconstructed head BMNs with their bristles” now read:

      The reconstructed BMN projections were next matched with their specific bristle populations. The projections were clustered based on morphological similarity using the NBLAST algorithm (example clustering at cut height 5 shown in Figure 6 – figure supplement 1A,B, Supplementary file 3, FlyWire.ai link 2) (Costa et al., 2016). Clusters could be assigned as BMN types based on their similarity to light microscopy images of BMNs known to innervate specific bristles.

      The number of reconstructed BMNs is remarkably similar to what is expected based on bristle counts for each group except for lnOm. Why do you think there is such a large discrepancy there?

      We believe that there is a discrepancy between the number of reconstructed BM-InOm neurons and the number expected based on InOm bristle counts because these bristle counts were based on few flies and these numbers appear to be variable. We did not further investigate the numbers of InOm bristles in this manuscript because we only needed an estimate of their numbers, given that there is over an order of magnitude difference in the eye bristles versus any other head bristle population. Therefore, we could relatively easily conclude that the head BMNs were related to the InOm bristles, based on their sheer numbers and their morphology.

      Figure 6 - figure supplement 2N, please describe these panels better. Main text says the upper image is from lnOmBMN-LexA, but the figure legend doesn't agree.

      We have added text to the figure legend that now makes the contents of panel 2N clear to the reader. Further, we now indicate in the figure legend for each panel, the method used to obtain the labeled neurons (i.e. fill, MCFO, driver), to avoid similar confusion for the other panels.

      Figure 6 - figure supplement 4D. How frequently is there a mismatch between the number of BMNs for a given type across hemispheres?

      Although the full reconstruction of the BMNs on both sides of the brain was beyond the scope of this work, the BMNs on both sides have since been reconstructed and annotated (Schlegal et al. 2023). We plan to provide more analysis of BMNs on both sides of the brain in a forthcoming manuscript. However, the BMN numbers tend to show agreement on both sides of the brain. The table below shows a comparison between the two sides:

      Author response table 1.

      Figures 6 and 7. It would be helpful to include a reference brain in all panels that show cluster morphology. Without landmarks there is nothing to anchor the eye to allow the reader to see the described differences in BMN projection zones and patterns.

      While we apologize for not making this specific change, we have made revisions to other parts of the manuscript to better highlight the somatotopic organization among the BMNs (revisions described above). Please note that we now provide FlyWire.ai publicly available links that enable readers to view the BMN projections in 3D. They can also toggle a brain mesh on and off to provide spatial reference.

      "BMN somatotopic map": It would be helpful to show or describe in more detail what the unique branch morphology for each zone is. It is quite difficult to appreciate, as the groups also have a lot of overlap. Would the unique regions that the BMN groups innervate be easier to see if you plotted presynaptic sites by group? I am left unsure about whether there is a somatotopic map here.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions. Please note that we did not examine the fine branch morphological differences between BMN types having overlapping projections. Showing these differences would require more extensive anatomical analysis that is beyond the scope of this work. For showing definitive somatotopy, we focused on the overt differences between BMNs innervating bristles at distant locations on the head.

      Overall the strict adherence to the parallel model impacts the interpretation of the data. It would be helpful for the authors to discuss which aspects of the current study are consistent with the parallel model and which results are not consistent.

      We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      Discussion:

      "Circuits that elicit aimed grooming of specific head locations": In the previous paragraph you mention "BMN types innervating neighboring bristle populations have overlapping projections into zones that correspond roughly to the dorsal, ventral, and posterior head. The overlap is likely functionally significant, as cosine similarity analysis revealed that neighboring head BMN types have common postsynaptic partners. However, overlap between neighboring BMN types is only partial, as they show differing projections and postsynaptic connectivity." Then in this paragraph, you say, "How do the parallel-projecting head BMNs interface with postsynaptic neural circuits to elicit aimed grooming of specific head locations? Different evidence supports the hypothesis that the BMNs connect with parallel circuits that each elicit a different aimed grooming movement (Seeds et al., 2014)." The overlapping postsynaptic BMN connectivity seems in conflict with the claim that the circuits are parallel.

      We apologize for this confusion. We now better describe this apparent discrepancy between our results and the parallel model of grooming behavior. We made several revisions to the manuscript that address this recommendation. Please see above for a description of these revisions.

      We have made additional changes to the manuscript:

      (1) We added Supplementary file 2 that includes links for downloading the image stacks used to generate panels in Figure 1, Figure 2, Figure 3, Figure 4, and figure supplements for these figures. These image stacks are stored in the Brain Image Library (BIL). Rows in the spreadsheet correspond to each image stack. Columns provide information about each stack including: figure panels that each image stack contributed to, image stack title, DOI for each stack (link provides metadata for each stack and file download link), image stack file name, genotype of imaged fly, and information about image stack. References to this file have been made at different locations throughout the text and Figure legends. We also added a section on the BIL data in the Materials and methods entitled “Light microscopy image stack storage and availability”. Old Supplementary file 2 has been renamed Supplementary file 3.

      (2) We added a new reference for FlyWire.ai (Dorkenwald et al. 2023) that was posted as a preprint during the revision of this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of

      'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition. 

      Major strengths: 

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used. 

      We are glad our main points came through to the reviewer.  

      Major weaknesses: 

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing? 

      This is a helpful comment. Our hypothesis is that D1 and D2 MSNs had similar patterns of activity.  Our rationale is prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors had similar behavioral effects on interval timing (De Corte et al., 2019; Stutt et al., 2023), We rewrote our introduction with this idea in mind (Line 89)

      “We and others have found that striatal MSNs encode time across multiple intervals by time-dependent ramping activity or monotonic changes in firing rate across a temporal interval (Emmons et al., 2017; Gouvea et al., 2015; Mello et al., 2015; Wang et al., 2018). However, the respective roles of D2-MSNs and D1-MSNs are unknown. Past work has shown that disrupting either D2-dopamine receptors (D2) or D1-dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval. 

      We tested this hypothesis with a combination of optogenetics, neuronal ensemble recording, computational modeling, and behavioral pharmacology. We use a well-described mouse-optimized interval timing task (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). Strikingly, optogenetic tagging of D2-MSNs and D1-MSNs revealed distinct neuronal dynamics, with D2-MSNs tending to increase firing over an interval and D1-MSNs tending to decrease firing over the same interval, similar to opposing movement dynamics (Cruz et al., 2022; Kravitz et al., 2010; Tecuapetla et al., 2016). MSN dynamics helped construct and constrain a four-parameter drift-diffusion computational model of interval timing, which predicted that disrupting either D2MSNs or D1-MSNs would increase interval timing response times. Accordingly, we found that optogenetic inhibition of either D2-MSNs or D1-MSNs increased interval timing response times. Furthermore, pharmacological blockade of either D2- or D1receptors also increased response times and degraded trial-by-trial temporal decoding from MSN ensembles. Thus, D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either MSN type produced similar effects on behavior. These data demonstrate how striatal pathways play complementary roles in elementary cognitive operations and are highly relevant for understanding the pathophysiology of human diseases and therapies targeting the striatum.”

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances. 

      Regarding the results presented in Figures 2 and 3: 

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here. 

      We are glad the reviewer raised this point. First, regarding the components in noisy data, what the reviewer says is correct, but usually, the variance explained by PC1 is small. This is the reason we include scree plots in our PC analysis (Fig 3B and Fig 6G). When we compare our PC1s to variance explained in random data, our PC1 variance is always stronger. We have now included this in our manuscript:

      First, we generated random data and examined how much variance PC1 might generate. 

      We added this to the methods (Line 634)

      “The variance of PC1 was empirically compared against data generated from 1000 iterations of data from random timestamps with identical bins and kernel density estimates. Average plots were shown with Gaussian smoothing for plotting purposes only.”

      These data suggested that our PC1 was stronger than that observed in random data (Line 183):

      “PCA identified time-dependent ramping activity as PC1 (Fig 3A), a key temporal signal that explained 54% of variance among tagged MSNs (Fig 3B; variance for PC1 p = 0.009 vs 46 (44-49)% variance for PC1 derived from random data; Narayanan, 2016).”

      And in the pharmacology data (Line 367):

      “The first component (PC1), which explained 54% of neuronal variance, exhibited “time-dependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      Second, we note that we have used this analysis extensively in the past, and PC1 has always been identified as a linear ramping in our work and in work by others (Line 179):

      “Work by our group and others has uniformly identified PC1 as a linear component among corticostriatal neuronal ensembles during interval timing (Bruce et al., 2021; Emmons et al., 2020, 2019, 2017; Kim et al., 2017a; Narayanan et al., 2013; Narayanan and Laubach, 2009; Parker et al., 2014; Wang et al., 2018).”

      Third, we find that PC1 is highly correlated to the GLM slope (Line 205):

      “Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 10-8).”

      Fourth, our goal was not to heavily interpret PC1 – but to compare D1 vs. D2 MSNs, or compare population responses to D2/D1 pharmacology. We have now made this clear in introducing PCA analyses in the results (Line 177):

      “To quantify differences in D2-MSNs vs D1-MSNs, we turned to principal component analysis (PCA), a data-driven tool to capture the diversity of neuronal activity (Kim et al., 2017a).”

      Finally, despite these arguments the reviewer’s point is well taken. Accordingly, we have removed all analyses of PC2 from the manuscript which may have been overly interpretative. 

      We have now removed language that interpreted the components, and we now find the discussion of PC1 much more data-driven. We have also removed much of the advanced PC analysis in Figure S9. Given our extensive past work using this exact analysis of PC1, we think PCA adds a considerable amount to our manuscript justified as the reviewer suggested. 

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.  

      We agree – we now do exactly this analysis in Figure 3D. We now clarify this in detail, using the reviewer’s language to the methods (Line 648):

      “To measure time-related ramping over the first 6 seconds of the interval, we used trial-by-trial generalized linear models (GLMs) at the individual neuron level in which the response variable was firing rate and the predictor variable was time in the interval or nosepoke rate (Shimazaki and Shinomoto, 2007). For each neuron, it’s time-related “ramping” slope was derived from the GLM fit of firing rate vs time in the interval, for all trials per neuron. All GLMs were run at a trial-by-trial level to avoid effects of trial averaging (Latimer et al., 2015) as in our past work (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017b).”

      And to the results (Line 194):

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015).”

      Relatedly, it seems that the data shown in Figure 2D *doesn't* support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types. 

      This likely refers to Figure 3D. The reviewer is correct that the changes in slope are small and near 0. Our goal was to show that D2-MSN and D1-MSN slopes were distinct – rather than increasing and decreasing. We have added this to the abstract (Line 46)

      “We found that D2-MSNs and D1-MSNs exhibited distinct dynamics over temporal intervals as quantified by principal component analyses and trial-by-trial generalized linear models.”

      We have clarified this idea in our hypothesis (Line 96):

      “These data led to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      We have added this idea to the results (Line 194)

      “To interrogate these dynamics at a trial-by-trial level, we calculated the linear slope of D2-MSN and D1-MSN activity over the first 6 seconds of each trial using generalized linear modeling (GLM) of effects of time in the interval vs trial-by-trial firing rate (Latimer et al., 2015). Nosepokes were included as a regressor for movement. GLM analysis also demonstrated that D2-MSNs had significantly different slopes (-0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1MSNs (-0.20 (-0.47– -0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)). We found that D2-MSNs and D1-MSNs had significantly different slopes even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F = 7.51, p = 0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F = 4.3, p = 0.04 accounting for variance between mice). Trial-by-trial GLM slope was correlated with PC1 scores in Fig 3A-C (PC1 scores vs. GLM slope r = -0.60, p = 108). These data demonstrate that D2-MSNs and D1-MSNs had distinct slopes of firing rate across the interval and were consistent with analyses of average activity and PC1, which exhibited time-related ramping.”

      And Line 215:

      “In summary, we used optogenetic tagging to record from D2-MSNs and D1-MSNs during interval timing. Analyses of average activity, PC1, and trial-by-trial firingrate slopes over the interval provide convergent evidence that D2-MSNs and D1MSNs had distinct and opposing dynamics during interval timing. These data provide insight into temporal processing by striatal MSNs.”

      And in the discussion (Line 415):

      “We describe how striatal MSNs work together in complementary ways to encode an elementary cognitive process, interval timing. Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. “

      We have now included a new plot with box plots to make the differences in Figure 3D clear

      Other reviewers requested additional qualitative descriptions of our data, and we have referred to increases / decreases in this context. 

      Regarding the results in Figure 4: 

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data. 

      We are glad the reviewer raised these points. Our goal was to use neuronal activity to fit behavioral activity, not the reverse. While we understand the reviewer’s point, we note that one behavioral output (switch time) can be encoded by many patterns of neuronal activity; thus, we are not sure we can use the model developed for behavior to fit diverse neuronal activity, or an ensemble of neurons. We have made this clear in the manuscript (Line 251):

      “Our model aimed to fit statistical properties of mouse behavioral responses while incorporating MSN network dynamics. However, the model does not attempt to fit individual neurons’ activity, because our model predicts a single behavioral parameter – switch time – that can be caused by the aggregation of diverse neuronal activity.”

      To attempt to do something close to what the reviewer suggested, we attempted to predict behavior directly from neuronal ensembles.  We have now made this clear in the methods on Line 682):

      “Analysis and modeling of mouse MSN-ensemble recordings. Our preliminary analysis found that, for sufficiently large number of neurons (𝑵 > 𝟏𝟏), each recorded ensemble of MSNs on a trial-by-trial basis could predict when mice would respond. We took the following approach: First, for each MSN, we convolved its trial-by-trial spike train 𝑺𝒑𝒌(𝒕) with a 1-second exponential kernel 𝑲(𝒕) = 𝒘 𝒆-𝒕/𝒘 if 𝒕 > 𝟎 and 𝑲(𝒕) = 𝟎 if 𝒕 ≤ 𝟎 (Zhou et al., 2018; here 𝒘 = 𝟏 𝒔). Therefore, the smoothed, convolved spiking activity of neuron 𝒋 (𝒋 = 𝟏, 𝟐, … 𝑵),

      tracks and accumulates the most recent (one second, in average) firing-rate history of the 𝒋-th MSN, up to moment 𝒕. We hypothesized that the ensemble activity

      (𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)), weighted with some weights 𝜷𝒋 , could predict the trial switch time 𝒕∗ by considering the sum

      and the sigmoid 

      that approximates the firing rate of an output unit. Here parameter 𝒌   indicates how fast 𝒙(𝒕) crosses the threshold 0.5 coming from below (if 𝒌 > 𝟎) or coming from above (if 𝒌 < 𝟎) and relates the weights 𝜷𝒋 to the unknowns 𝜷H𝒋 \= 𝜷𝒋/𝒌 and 𝜷H𝟎 \= −𝟎. 𝟓/𝒌. Next, we ran a logistic fit for every trial for a given mouse over the spike count predictor matrix 7𝒙𝟏(𝒕), 𝒙𝟐(𝒕), … , 𝒙𝑵(𝒕)9 from the mouse MSN recorded ensemble, and observed value 𝒕∗, estimating the coefficients 𝜷H𝟎 and 𝜷H𝒋, and so, implicitly, the weights 𝜷𝒋. From there, we compute the predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 by condition 𝒙(𝒕) = 𝟎. 𝟓. Accuracy was quantified comparing the predicted accuracy within a 1 second window to switch time on a trial-by-trial basis (Fig S4).

      And in the results (Line 254): 

      We first analyzed trial-based aggregated activity of MSN recordings from each mouse (𝒙𝒋(𝒕)) where 𝒋 = 𝟏, … , 𝑵 neurons. For D2-MSN or D1-MSN ensembles of 𝑵 > 𝟏𝟏, we found linear combinations of their neuronal activities, with some 𝜷𝒋 coefficients,

      that could predict the trial-by-trial switch response times (accuracy > 90%, Fig S4; compared with < 20% accuracy for Poisson-generated spikes of same trial-average firing rate). The predicted switch time 𝒕∗𝒑𝒓𝒆𝒅 was defined by the time when the weighted ensemble activity 𝒙(𝒕) first reached the value 𝒙) = 0.5. Finally, we built DDMs to account for this opposing trend (increasing vs decreasing) of MSN dynamics and for ensemble threshold behavior defining 𝒕∗𝒑𝒓𝒆𝒅; see the resulting model (Equations 1-3) and its simulations (Figure 4A-B).”

      And we have added a new figure, Figure S4, that demonstrates these trial-by-trial predictions of switch response times.  

      Note that we have included predictions from shuffled data similar to what the reviewer suggested based on shuffled data. Predictions are derived from neuronal ensembles on that trial; thus we could not apply a leave-one-out approach to trial-by-trial predictions.

      These models are highly predictive for larger ensembles and poorly predictive for smaller ensembles.  We think this model adds to the manuscript and we are glad the reviewer suggested it. 

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).  

      Our model was inspired by the aggregate activity.  We have now made this clear in the results (Line 227): 

      “Our data demonstrate that D2-MSNs and D1-MSNs have opposite activity patterns. However, past computational models of interval timing have relied on drift-diffusion dynamics with a positive slope that accumulates evidence over time (Nguyen et al., 2020; Simen et al., 2011). To reconcile how these MSNs might complement to effect temporal control of action, we constructed a four-parameter drift-diffusion model (DDM). Our goal was to construct a DDM inspired by average differences in D2MSNs and D1-MSNs that predicted switch-response time behavior.”

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition. 

      We have clarified that our parameters were chosen to best fit behavior (Line 266):

      “The model’s parameters were chosen to fit the distribution of switch-response times:

      𝑭 = 𝟏, 𝒃 = 𝟎. 𝟓𝟐 (so 𝑻 = 𝟎. 𝟖𝟕), 𝑫 = 𝟎. 𝟏𝟑𝟓, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D2-MSNs (Fig 4A, in black); and  𝑭 = 𝟎, 𝒃 = 𝟎. 𝟒𝟖 (so 𝑻 = 𝟎. 𝟏𝟑), 𝑫 = 𝟎. 𝟏𝟒𝟏, 𝝈 = 𝟎. 𝟎𝟓𝟐 for intact D1-MSNs (Fig 4B, in black).”

      Furthermore, we have clarified the approach to noise in the results (Line 247):  

      “The drift, together with noise 𝝃(𝒕) (of zero mean and strength 𝝈), leads to fluctuating accumulation which eventually crosses a threshold 𝑻 (see Equation 3; Fig 4A-B).”

      And Line 279: 

      “The results were obtained by simultaneously decreasing the drift rate D  (equivalent to lengthening the neurons’ integration time constant) and lowering the level of network noise 𝝈: D = 𝟎. 𝟏𝟐𝟗, 𝝈 = 𝟎. 𝟎𝟒𝟑 for D2-MSNs in Fig 4A (in red; changes in noise had to accompany changes in drift rate to preserve switch response time variance); and 𝑫 = 𝟎. 𝟏𝟐𝟐, 𝝈 = 𝟎. 𝟎𝟒𝟑  for D1-MSNs in Fig 4B (in blue). The model predicted that disrupting either D2-MSNs or D1-MSNs would increase switch response times (Fig 4C and Fig 4D) and would shift MSN dynamics.”

      Regarding the results in Figure 6: 

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper. 

      We agree – we removed PC2 for these reasons. We have also noted that the primary reason for PC1 was to compare results of D2/D1 blockade (Line 362):

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016).”

      As noted above, PC1 does not explain this level of variance in noisy data.

      We also reworked Figure 6 to make the effects of D2 and D1 blockade more apparent by moving the matched sorting to the main figure: 

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result. 

      These are important suggestions, we changed our analysis to better capture the variability and main effects in the data, exactly as the reviewer suggested. First, we now included 3 individual raster examples, exactly as the reviewer suggested

      As the reviewer suggested, we wanted to compare variability for *all* MSNs. We sorted the same MSNs across saline, D2 blockade, and D1 blockade sessions. We detailed these sorting details in the methods (Line 618):

      “Single-unit recordings were made using a multi-electrode recording system (Open

      Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms. The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via correlation coefficients between sessions.”

      To confirm that we were able to track neurons across sessions, we quantified waveform similarity (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      As noted above, this enabled us to compare activity for the same MSNs across sessions in a new Figure 6 (previously, this analysis had been in Figure S9), and used PCA to quantify this variability.

      By tracking neurons across saline, D2 blockade, and D1 blockade, readers can see all the variability in MSNs. We added these data to the results (Line 362):  

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together. The first component (PC1), which explained 54% of neuronal variance, exhibited “timedependent ramping”, or monotonic changes over the 6 second interval immediately after trial start (Fig 6F-G; variance for PC1 p = 0.001 vs 46 (45-47)% variance in random data; Narayanan, 2016). Interestingly, PC1 scores shifted with D2 blockade (Fig 6F; PC1 scores for D2 blockade: -0.6 (-3.8 – 4.7) vs saline: -2.3 (-4.2 – 3.2), F = 5.1, p = 0.03 accounting for variance between MSNs; no reliable effect of sex (F = 0.2, p = 0.63) or switching direction (F = 2.8, p = 0.10)). PC1 scores also shifted with D1 blockade (Fig 6F; PC1 scores for D1 blockade: -0.0 (-3.9 – 4.5), F = 5.8, p = 0.02 accounting for variance between MSNs; no reliable effect of sex (F = 0.0, p = 0.93) or switching direction (F = 0.9, p = 0.34)). There were no reliable differences in PC1 scores between D2 and D1 blockade. Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade. Taken together, this data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1. When combined with the major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings indicate that pharmacological D2 blockade and D1 blockade disrupt ramping-related activity in the striatum.”

      Finally, we included the data in which sessions were sorted independently and assumed to be fully statistically independent in a new Figure S10.

      And in the results (Line 376): 

      “Furthermore, PC1 was distinct even when sessions were sorted independently and assumed to be fully statistically independent (Figure S10; D2 blockade vs saline: F = 5.8, p = 0.02; D1 blockade vs saline: F = 4.9, p = 0.03; all analyses accounting for variance between mice). Higher components explained less variance and were not reliably different between saline and D2 blockade or D1 blockade.”

      These changes strengthen the manuscript and better show the main effects and variability of the data. 

      Regarding the results in Figure 7: 

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this? 

      This was very unclear. The second classifier was predicting response time, but it was confusing, and we removed it. 

      Impact: 

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions. 

      As noted above, we clarified our hypothesis and implications, and strengthened several aspects of the data as suggested by this reviewer.  

      Reviewer #2 (Public Review): 

      Summary: 

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis. 

      Strengths: 

      The authors used multiple approaches including awake mice behavior training, optogeneticassistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing. 

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.  

      Weaknesses: 

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke? 

      We completely agree. We have now completely revised Figure 1 to include many of these task details.

      We have clarified remaining details in the methods (Line 548):

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).

      Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in the results on Line 131: 

      “We investigated cognitive processing in the striatum using a well-described mouseoptimized interval timing task which requires mice to respond by switching between two nosepokes after a 6-second interval (Fig 1A; see Methods; (Balci et al., 2008; Bruce et al., 2021; Larson et al., 2022; Tosun et al., 2016; Weber et al., 2023)). In this task, mice initiate trials by responding at a back nosepoke, which triggers auditory and visual cues for the duration of the trial. On 50% of trials, mice were rewarded for nosepoking after 6 seconds at the designated ‘first’ front nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).

      We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a timebased decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1B-E). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7 (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text. 

      We agree. We have clarified this in a new schematic, shading the interval in gray:   

      And in the results on line 131:

      “We focused on the switch response time, defined as the moment mice exited the first nosepoke before entering the second nosepoke. Switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepoking at the first nosepokes does not lead to a reward after 6 seconds (Fig 1BE). Switch responses are guided by internal estimates of time because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses. In 30 mice, switch response times were 9.3 seconds (8.4 – 9.7; median (IQR)); see Table 1 for a summary of mice, experiments, trials, and sessions). We studied dorsomedial striatal D2-MSNs and D1-MSNs using a combination of optogenetics and neuronal ensemble recordings in 9 transgenic mice (4 D2-Cre mice switch response time 9.7

      (7.0 – 10.3) seconds; 5 D1-Cre mice switch response time 8.2 (7.7 – 8.7) seconds; rank sum p = 0.73; Table 1).”

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch". 

      This is a great suggestion. We analyzed such error trials and MSN activity in Figure 6 of Bruce et al., 2021. However, this manuscript was not designed to analyze errors, as they are rare beyond initial training (Bruce et al., 2021 focused on early training), and too inconsistent to permit robust analysis. This was added to the methods on Line 567:

      “Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke. 

      We have now defined it explicitly in the schematic: 

      Incidentally, this reviewer asked us to analyze a longer epoch – this analysis beautifully justifies our focus on the first 6 seconds (now in Figure S2).

      We focus on the first six seconds as there are few nosepokes and switch responses during this epoch; however, we consider the reviewer’s definition and analyze the epoch the reviewer suggests from 0 to the switch in analyses below. 

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. 

      We are glad the reviewer raised these points. First, our tagging dataset is relatively standard for optogenetic tagging. Second, we now include Cohen’s d for both PC and slope results for all optogenetic tagging analysis, which demonstrate that we have adequate statistical power and medium-to-large effect sizes (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      We added boxplots to Figure 3, which better highlight differences in these distributions.

      However, the reviewer’s point is well-taken, and we have added a caveat to the discussion exactly as the reviewer suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity. 

      We were not clear – we now do exactly as the reviewer suggested. We are not pooling any data – instead – as we state on line 620 - we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested (see letter). We state this explicitly in the methods (Line 704):

      “Statistics. All data and statistical approaches were reviewed by the Biostatistics,

      Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows us to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now stated in the results that we are explicitly accounting for variance between mice (Line 186): 

      “In line with population averages from Fig 2G&H, D2-MSNs and D1-MSNs had opposite patterns of activity with negative PC1 scores for D2-MSNs and positive PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-2.8 – 4.9); F = 8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F = 0.44, p = 0.51) or switching direction (F = 1.73, p = 0.19)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.47– 0.06; Fig 3D; F = 8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98; no reliable effect of sex (F = 0.02, p = 0.88) or switching direction (F = 1.72, p = 0.19)).”

      All statistics in the manuscript now explicitly account for variance between mice. 

      This is the approach that was recommended by our the Biostatistics, Epidemiology, and

      Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa, who reviews all of our work.

      We note that these Cohen d values usually interpret as medium or large. 

      We performed statistical power calculations and include these to aid readers’ interpretation. These are all >0.8. 

      Finally, the reviewer uses the word ‘trend’. We define p values <0.05 as significant in the methods, and do not interpret trends (on line 717): 

      “P values < 0.05 were interpreted as significant.”

      And, we have now plotted values for each mouse in a new Figure S3.

      As noted in the figure legend, mouse-specific effects were analyzed using linear models that account for between-mouse variability, as discussed with our statisticians. However, the reviewer’s point is well taken, and we have added this idea to the discussion as suggested (Line 496):

      “Second, although we had adequate statistical power and medium-to-large effect sizes, optogenetic tagging is low-yield, and it is possible that recording more of these neurons would afford greater opportunity to identify more robust results and alternative coding schemes, such as neuronal synchrony.”

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity? 

      This is a key point. First, we are not certain what state the animal is in until they initiate trials at the back nosepoke (“Start”). Therefore, we cannot analyze this epoch.  

      However, we can show neuronal activity during a longer epoch exactly as the reviewer suggested. Although there are modulations, the biggest difference between D2 and D1 MSNs is during the 0-6 second interval. This analysis supports our focus on the 0-6 second interval. We have included this as a new Figure S2.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window. 

      This is a great suggestion. We did exactly this and adjusted our linear models on a trialby-trial basis to account for time between the start of the interval and the switch. This is now added to the methods (line 656): 

      “We performed additional sensitivity analysis excluding outliers and measuring firing rate from the start of the interval to the time of the switch response on a trialby-trial level for each neuron.”

      And to the results (Line 201):

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      We now state our justification for focusing on the first 6 seconds of the interval (Line 134)

      “Switch responses are guided by internal estimates of time and temporal control of action because no external cue indicates when to switch from the first to the second nosepoke (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). We defined the first 6 seconds after trial start as the ‘interval’, because during this epoch mice are estimating whether 6 seconds have elapsed and if they need to switch responses.”

      As noted previously, epoch is now justified by Figure S2E.

      And we note that this focus minimizes motor confounds (Line 511):

      “Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      We are glad the reviewer suggested this analysis as it strengthens our manuscript.  

      Reviewer #3 (Public Review): 

      Summary: 

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions. 

      Strengths: 

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model. 

      We are grateful for the reviewer’s consideration of our work and for recognizing the strengths of our approach.  

      Weaknesses: 

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is a key point, and the reviewer is correct. We use our task because of its’ translational validity; as far as we know, temporal bisection tasks have been used less often in human disease and in rodent models. We have included a new paragraph describing this in the discussion (Line 472):

      “Because interval timing is reliably disrupted in human diseases of the striatum such as Huntington’s disease, Parkinson’s disease, and schizophrenia (Hinton et al., 2007; Singh et al., 2021; Ward et al., 2011), these results have relevance to human disease. Our task version has been used extensively to study interval timing in mice and humans (Balci et al., 2008; Bruce et al., 2021; Stutt et al., 2024; Tosun et al., 2016; Weber et al., 2023). However, temporal bisection tasks, in which animals hold during a temporal cue and respond at different locations depending on cue length, have advantages in studying how animals time an interval because animals are not moving while estimating cue duration (Paton and Buonomano, 2018; Robbe, 2023; Soares et al., 2016). Our interval timing task version – in which mice switch between two response nosepokes to indicate their interval estimate has elapsed – has been used extensively in rodent models of neurodegenerative disease (Larson et al., 2022; Weber et al., 2024, 2023; Zhang et al., 2021), as well as in humans (Stutt et al., 2024). Furthermore, because many therapeutics targeting dopamine receptors are used clinically, these findings help describe how dopaminergic drugs might affect cognitive function and dysfunction. Future studies of D2-MSNs and D1-MSNs in temporal bisection and other timing tasks may further clarify the relative roles of D2- and D1-MSNs in interval timing and time estimation.”

      Furthermore, we have modified the use of the definition of interval timing in the abstract, introduction, and results to reflect the reviewers comment. For instance, in the abstract (Line 43):

      “We studied dorsomedial striatal cognitive processing during interval timing, an elementary cognitive task that requires mice to estimate intervals of several seconds and involves working memory for temporal rules as well as attention to the passage of time.”

      However, we think it is important to use the term ‘interval timing’ as it links to past work by our group and others.   

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels. 

      This is a key point raised by other reviewers as well. We have now included measures of statistical power (as we interpret the reviewer’s comment of predictive power), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A);  Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And on Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distribution.

      Finally, we note that our conclusions are drawn from many convergent analyses (on Line 216): 

      “Analyses of average activity, PC1, and trial-by-trial firing-rate slopes over the interval provide convergent evidence that D2-MSNs and D1-MSNs had distinct and opposing dynamics during interval timing.”

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. 

      This is an important point. We are well aware of heating effects with optogenetics and other potential confounds. For the exact reasons noted by the reviewer, we had opsinnegative controls – where the laser was on for the exact same amount of time (18 seconds) and at the same power (12 mW)– in Figure S5. We have now better highlighted these controls in the methods (Line 598):

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials. We performed control experiments in mice without opsins using identical laser parameters in D2-cre or D1-cre mice (Fig S6).”

      And in results (Line 298):

      “Importantly, we found no reliable effects for D2-MSNs with opsin-negative controls (Fig S6).”

      And Line 306): 

      “As with D2-MSNs, we found no reliable effects with opsin-negative controls in D1MSNs (Fig S6).”

      We have highlighted these data in Figure S6: 

      Furthermore, the effect of optogenetic inhibition is similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2024) on line 459): 

      “Past pharmacological work from our group and others has shown that disrupting D2- or D1-MSNs slows timing (De Corte et al., 2019b; Drew et al., 2007, 2003; Stutt et al., 2024), in line with pharmacological and optogenetic results in this manuscript.”

      And in the discussion section on Line 488: 

      “Our approach has several limitations. First, systemic drug injections block D2- and D1-receptors in many different brain regions, including the frontal cortex, which is involved in interval timing (Kim et al., 2017a). D2 blockade or D1 blockade may have complex effects, including corticostriatal or network effects that contribute to changes in D2-MSN or D1-MSN ensemble activity. We note that optogenetic inhibition of D2-MSNs and D1-MSNs produces similar effects to pharmacology in Figure 5.”

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum. 

      This is a great point - we did this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment. We now point this out in the introduction (Line 92): 

      “Past work has shown that disrupting either D2-dopamine receptors (D2) or D1dopamine receptors (D1) powerfully impairs interval timing by increasing estimates of elapsed time (Drew et al., 2007; Meck, 2006). Similar behavioral effects were found with systemic (Stutt et al., 2024) or local dorsomedial striatal D2 or D1 disruption (De Corte et al., 2019a). These data lead to the hypothesis that D2 MSNs and D1 MSNs have similar patterns of ramping activity across a temporal interval.”

      However, the reviewer makes a great point - and we will develop this in our future work (Line 485): 

      “Future studies might extend our work combining local pharmacology with neuronal ensemble recording.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Just a few minor notes: 

      (1) Figures 2C and D should have error bars. 

      We agree.  We added error bars to these figures and other rasters as recommended.  

      (2) Figures 2G and H seem to be smoothed - how was this done? 

      We added these details.

      (3) It is unclear what the 'neural network machine learning classifier' mentioned in lines 193-199 adds if the data relevant to this analysis isn't presented. I would potentially include this. 

      We agree. This analysis was confusing and not relevant to our main points; consequently, we removed it.  

      Reviewer #2 (Recommendations For The Authors): 

      Major: 

      (1)  For Figure 2, the description of the main results in (C-F) in the main text is too brief and is not clear. 

      We have added to and clarified this text (Line 147)

      “Striatal neuronal populations are largely composed of MSNs expressing D2dopamine or D1-dopamine receptors. We optogenetically tagged D2-MSNs and D1MSNs by implanting optrodes in the dorsomedial striatum and conditionally expressing channelrhodopsin (ChR2; Fig S1) in 4 D2-Cre (2 female) and 5 D1-Cre transgenic mice (2 female). This approach expressed ChR2 in D2-MSNs or D1MSNs, respectively (Fig 2A-B; Kim et al., 2017a). We identified D2-MSNs or D1MSNs by their response to brief pulses of 473 nm light; neurons that fired within 5 milliseconds were considered optically tagged putative D2-MSNs (Fig S1B-C). We tagged 32 putative D2-MSNs and 41 putative D1-MSNs in a single recording session during interval timing. There were no consistent differences in overall firing rate between D2-MSNs and D1-MSNs (D2-MSNs: 3.4 (1.4 – 7.2) Hz; D1-MSNs 5.2 (3.1 – 8.6) Hz; F = 2.7, p = 0.11 accounting for variance between mice). Peri-event rasters and histograms from a tagged putative D2-MSN (Fig 2C) and from a tagged putative D1-MSN (Fig 2D) demonstrate prominent modulations for the first 6 seconds of the interval after trial start. Z-scores of average peri-event time histograms (PETHs) from 0 to 6 seconds after trial start for each putative D2-MSN are shown in Fig 2E and for each putative D1-MSN in Fig 2F. These PETHs revealed that for the 6-second interval immediately after trial start, many putative D2-MSN neurons appeared to ramp up while many putative D1-MSNs appeared to ramp down. For 32 putative D2-MSNs average PETH activity increased over the 6second interval immediately after trial start, whereas for 41 putative D1-MSNs, average PETH activity decreased. These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs. Examination of a longer interval of 10 seconds before to 18 seconds after trial start revealed the greatest separation in D2-MSN and D1-MSN dynamics during the 6-second interval after trial start (Fig S2). Strikingly, these data suggest that D2-MSNs and D1-MSNs might display opposite dynamics during interval timing.”

      (2)  For Figure3 

      (A)  Is the PC1 calculated from all MSNs of all mice (4 D2, 5 D1 mice)? 

      We clarified this (Line 182):

      “We analyzed PCA calculated from all D2-MSNs and D1-MSNs PETHs over the 6second interval immediately after trial start.”

      And for pharmacology (Line 362): 

      “We noticed differences in MSN activity across the interval with D2 blockade and D1 blockade at the individual MSN level (Fig 6B-D) as well as at the population level (Fig 6E). We used PCA to quantify effects of D2 blockade or D1 blockade (Bruce et al., 2021; Emmons et al., 2017; Kim et al., 2017a). We constructed principal components (PC) from z-scored peri-event time histograms of firing rate from saline, D2 blockade, and D1 blockade sessions for all mice together.”

      (B)  The authors should perform PCA on single mouse data, and add the plot and error bar. 

      This is a great idea. We have now included this as a new Figure S3:   

      (C)  As mentioned before, both D2-or D1- MSNs can be divided into three groups, it is not appropriate to put them together as each MSN is not an independent variable, the authors should do the statistics based on the individual mouse, and do the parametric or non-parametric comparison, and plot N (number of mice) based error bars. 

      We have done exactly this using a linear mixed effects model, as recommend by our statistics core. They have explicitly suggested that this is the best approach to these data (see letter). We have also included measures of statistical power and effect size (Line 704):  

      “All data and statistical approaches were reviewed by the Biostatistics, Epidemiology, and Research Design Core (BERD) at the Institute for Clinical and Translational Sciences (ICTS) at the University of Iowa. All code and data are made available at http://narayanan.lab.uiowa.edu/article/datasets. We used the median to measure central tendency and the interquartile range to measure spread. We used Wilcoxon nonparametric tests to compare behavior between experimental conditions and Cohen’s d to calculate effect size. Analyses of putative single-unit activity and basic physiological properties were carried out using custom routines for MATLAB.

      For all neuronal analyses, variability between animals was accounted for using generalized linear-mixed effects models and incorporating a random effect for each mouse into the model, which allows to account for inherent between-mouse variability. We used fitglme in MATLAB and verified main effects using lmer in R. We accounted for variability between MSNs in pharmacological datasets in which we could match MSNs between saline, D2 blockade, and D1 blockade. P values < 0.05 were interpreted as significant.”

      We have now included measures of ‘power’ (which we interpret to be statistical), effect size, and perform additional sensitivity analyses (Line 187): 

      “PC1 scores for D1-MSNs (Fig 3C; PC1 for D2-MSNs: -3.4 (-4.6 – 2.5); PC1 for D1MSNs: 2.8 (-4.9 – -2.8); F=8.8, p = 0.004 accounting for variance between mice (Fig S3A); Cohen’s d = 0.7; power = 0.80; no reliable effect of sex (F=1.9, p=0.17) or switching direction (F=0.1, p=0.75)).”

      And Line 197:

      “GLM analysis also demonstrated that D2-MSNs had significantly different slopes (0.01 spikes/second (-0.10 – 0.10)), which were distinct from D1-MSNs (-0.20 (-0.45– 0.06; Fig 3D; F=8.9, p = 0.004 accounting for variance between mice (Fig S3B); Cohen’s d = 0.8; power = 0.98).  We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial bases for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      These are medium-to-large Cohen’s d results, and we have adequate statistical power. These results are not easily explained by chance. 

      We also added boxplots, which highlight the differences in distributions.

      (3) For results in Figure 5 and Figure S7, according to Figure 1 legend, lines 4 to 5, the response times were defined as the moment mice exit the first nose poke (on the left) to respond at the second nose poke; and according to method session (line 522), "switch" traversal time was defined as the duration between first nose poke exit and second nose poke entry. It seems that response time is the switch traversal time, they should be the same, but in Figures B and D, the response time showed a clear difference between the laser off and on groups, while in Figures S7 C, and G, there were no differences between laser off and on group for switch traversal time. Please reconcile these inconsistencies. 

      We were not clear. We now clarify – switch responses are the moment when mice depart the first nosepoke, whereas traversal time is the time between departing the first nosepoke and arriving at the second nosepoke. We have reworked our figures to make this clear.

      And in the methods (Line 570):

      “Switch response time was defined as the moment animals departed the first nosepoke before arriving at the second nosepoke. Critically, switch responses are a time-based decision guided by temporal control of action because mice switch nosepokes only if nosepokes at the first location did not receive a reward after 6 seconds. That is, mice estimate if more than 6 seconds have elapsed without receiving a reward to decide to switch responses. Mice learn this task quickly (3-4 weeks), and error trials in which an animal nosepokes in the wrong order or does not nosepoke are relatively rare and discarded. Consequently, we focused on these switch response times as the key metric for temporal control of action. Traversal time was defined as the duration between first nosepoke exit and second nosepoke entry and is distinct from switch response time when animals departed the first nosepoke. Nosepoke duration was defined as the time between first nosepoke entry and exit for the switch response times only. Trials were self-initiated, but there was an intertrial interval with a geometric mean of 30 seconds between trials.”

      And in Figure S8, we have added graphics and clarified the legend.

      (4) The first nose poke and second nose poke are very close, why did it take so long to move from the first nose poke to the second nose poke, even though the mouse already made the decision to switch? Please see Figure S1A, it took less than 6s from the back nose poke to the first nose poke, but it took more than 6s (up to 12s) from the first nose poke to the second nose poke, what were the mice's behavior during this period? 

      This is a key detail. There is no temporal urgency as only the initial nosepoke after 18 seconds leads to reward. In other words, making a second nosepoke prior to 18 seconds is not rewarded and, in well-trained animals, is wasted effort. We have added these details to the methods (Line 124):

      “On the remaining 50% of trials, mice were rewarded for nosepoking at the ‘first’ nosepoke and then switching to the ‘second’ nosepoke; initial nosepokes at the second nosepoke after 18 seconds triggered reward when preceded by a first nosepoke. The first nosepokes occurred before switching responses and the second nosepokes occurred much later in the interval in anticipation of reward delivery at 18 seconds (Fig 1B-D). During the task, movement velocity peaked before 6 seconds as mice traveled to the front nosepoke (Fig 1E).”

      And in Figure 1, as described in detail above. 

      (5) How many trials did mice perform in one day? How many recordings/day for how many days were performed? 

      These are key details that we have now added to Table 1.

      We have added the number of recording sessions to the methods (Line 603): 

      “For optogenetic tagging, putative D1- and D2-MSNs were optically identified via 473-nm photostimulation. Units with mean post-stimulation spike latencies of ≤5 milliseconds and a stimulated-to-unstimulated waveform correlation ratio of >0.9 were classified as putative D2-MSNs or D1-MSNs (Ryan et al., 2018; Shin et al., 2018). Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 606: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (6) For results in Figure 5, the authors should analyze the speed for the laser on and off group, since the dorsomedial striatum was reported to be related to control of speed (Yttri, Eric A., and Joshua T. Dudman. "Opponent and bidirectional control of movement velocity in the basal ganglia." Nature 533.7603 (2016): 402-406.). 

      We have some initial DeepLabCut data and have included it in a new Figure 1E.

      B) DeepLabCut tracking of position during the interval timing revealed that mice moved quickly after trial start and then velocity was relatively constant throughout the trial

      We measure movement speed using nosepoke duration and traversal time, which can give some measure of movement velocity.

      In Yttri and Dudman, the mice are head-fixed and moving a joystick, whereas our mice are freely moving. However, we have now included the lack of motor control as a major limitation (Line 510): 

      “Finally, movement and motivation contribute to MSN dynamics (Robbe, 2023). Four lines of evidence argue that our findings cannot be directly explained by motor confounds: 1) D2-MSNs and D1-MSNs diverge between 0-6 seconds after trial start well before the first nosepoke (Fig S2), 2) our GLM accounted for nosepokes and nosepoke-related βs were similar between D2-MSNs and D1-MSNs, 3) optogenetic disruption of dorsomedial D2-MSNs and D1-MSNs did not change task-specific movements despite reliable changes in switch response time, and 4) ramping dynamics were quite distinct from movement dynamics. Furthermore, disrupting D2-MSNs and D1-MSNs did not change the number of rewards animals received, implying that these disruptions did not grossly affect motivation. Still, future work combining motion tracking with neuronal ensemble recording and optogenetics and including bisection tasks may further unravel timing vs. movement in MSN dynamics (Robbe, 2023).”

      (7)  Figure S3 (C, E, and F), statistics should be done based on N (number of mice), not on the number of recorded neurons.  

      We have removed this section, and all other statistics in the paper properly account for mouse-specific variance, as noted above.

      (8)  Figure S1 

      (A) Are these the results from all mice superposed together, or from one mouse on one given day? How many of the trials' data were superposed?

      We included these details in a new Figure 1.

      (B, C) How many trials were included? 

      (D) How many days did these data cover? 

      We have included a new Table 1 with these important details.

      We have noted that only 1 recording session / mouse was included in analysis (Line 606):

      “Only one recording session was performed for each animal per day, and one recording session was included from each animal.”

      And Line 614: 

      “Only one recording session was performed for each animal per day, and one recording session was included from saline, D2 blockade, and D1 blockade sessions.”

      (9) Figure S2 

      (A) Can the authors add coordinates of the brain according to the mouse brain atlas or, alternatively, show it using a coronal section? 

      Great idea – added to Figure S2 legend: 

      “Figure S1: A) Recording locations in the dorsomedial striatum (targeting AP +0.4, ML -1.4, DV -2.7). Electrode reconstructions for D2-Cre (red), D1-Cre (blue), and wild-type mice (green). Only the left striatum was implanted with electrodes in all animals.”

      We have also added it to Figure S5 legend: 

      “Figure S5: Fiber optic locations from A) an opsin-expressing mouse with mCherrytagged halorhodopsin and bilateral fiber optics, and B) across 10 D2-Cre mice (red) and 6 D1-cre mice (blue) with fiber optics (targeting AP +0.9, ML +/-1.3, DV –2.5).”

      (C) Why did the waveform of laser and no laser seem the same? 

      The optogenetically tagged spike waveforms are highly similar, indicating that optogenetically-triggered spikes are like other spikes. That is the main point – optogenetically stimulating the neuron does not change the waveform. We have added this detail to the legend of S1: 

      “Inset on bottom right – waveforms from laser trials (red) and trials without laser (blue).  Across 73 tagged neurons, waveform correlation coefficients for laser trials vs. trials without laser was r = 0.97 (0.92-0.99). These data demonstrate that optogenetically triggered spikes are similar to non-optogenetically triggered spikes.”

      (10)  Figure S7, what was the laser power used in this experiment? Have the authors tried different laser powers? 

      We have now clarified the laser power on line 598: 

      “In animals injected with optogenetic viruses, optical inhibition was delivered via bilateral patch cables for the entire trial duration of 18 seconds via 589-nm laser light at 12 mW power on 50% of randomly assigned trials.”

      And for Figure S6 (was S7 previously): 

      We did not try other laser powers; our parameters were chosen a priori based on our past work.  

      (11)  In Figure S9, what method was used to sort the neurons? 

      We now clarify in the methods (Line 617): 

      “Electrophysiology. Single-unit recordings were made using a multi-electrode recording system (Open Ephys, Atlanta, GA). After the experiments, Plexon Offline Sorter (Plexon, Dallas, TX), was used to remove artifacts. Principal component analysis (PCA) and waveform shape were used for spike sorting. Single units were defined as those 1) having a consistent waveform shape, 2) being a separable cluster in PCA space, and 3) having a consistent refractory period of at least 2 milliseconds in interspike interval histograms.  The same MSNs were sorted across saline, D2 blockade, and D1 blockade sessions by loading all sessions simultaneously in Offline Sorter and sorted using the preceding criteria. MSNs had to have consistent firing in all sessions to be included. Sorting integrity across sessions was quantified by comparing waveform similarity via R2 between sessions.”

      And in the results (Line 353):

      “We analyzed 99 MSNs in sessions with saline, D2 blockade, and D1 blockade. We matched MSNs across sessions based on waveform and interspike intervals; waveforms were highly similar across sessions (correlation coefficient between matched MSN waveforms: saline vs D2 blockade r = 1.00 (0.99 – 1.00 rank sum vs correlations in unmatched waveforms p = 3x10-44; waveforms; saline vs D1 blockade r = 1.00 (1.00 – 1.00), rank sum vs correlations in unmatched waveforms p = 4x10-50). There were no consistent changes in MSN average firing rate with D2 blockade or D1 blockade (F = 1.1, p = 0.30 accounting for variance between MSNs; saline: 5.2 (3.3 – 8.6) Hz; D2 blockade 5.1 (2.7 – 8.0) Hz; F = 2.2, p = 0.14; D1 blockade 4.9 (2.4 – 7.8) Hz).”

      (C-F) statistics should be done based on the number of mice, not on the number of recorded neurons. 

      We agree, all experiments are now quantified using linear mixed effects models which formally accounts for variance contributed across animals, as discussed at length earlier in the review and with statistical experts at the University of Iowa.

      (12) For results in Figure 6, did the authors do cell-type specific recording on D1 or D2 MSNs using optogenetic tagging? As the D1- or D2- MSNs account for ~50% of all MSNs, the inhibition of a considerable amount of neurons was not observed. The authors should discuss the relation between the results from optogenetic inhibition of D1- or D2- MSNs and pharmacological disruption of D1 or D2 dopamine receptors. 

      This is a great point. First, we did not combine cell-type specific recordings with tagging as it was difficult to get enough trials for analysis in a single session in the tagging experiments, and pharmacological interventions can further decrease performance.  However, we have made our results in Figure 6 much more focused.

      We have discussed the relationship between these data in the results (Line 380): 

      “This data-driven analysis shows that D2 and D1 blockade produced similar shifts in MSN population dynamics represented by PC1.  When combined with major contributions of D1/D2 MSNs to PC1 (Fig 3C) these findings show that pharmacologically disrupting D2 or D1 MSNs can disrupt ramping-related activity in the striatum.”

      And in the discussion (Line 417): 

      “Strikingly, optogenetic tagging showed that D2-MSNs and D1-MSNs had distinct dynamics during interval timing. MSN dynamics helped construct and constrain a four-parameter drift-diffusion model in which D2- and D1-MSN spiking accumulated temporal evidence. This model predicted that disrupting either D2MSNs or D1-MSNs would increase response times. Accordingly, we found that optogenetically or pharmacologically disrupting striatal D2-MSNs or D1-MSNs increased response times without affecting task-specific movements. Disrupting D2MSNs or D1-MSNs shifted MSN temporal dynamics and degraded MSN temporal encoding. These data, when combined with our model predictions, demonstrate that D2-MSNs and D1-MSNs contribute temporal evidence to controlling actions in time.”

      And: 

      “D2-MSNs and D1-MSNs play complementary roles in movement. For instance, stimulating D1-MSNs facilitates movement, whereas stimulating D2-MSNs impairs movement (Kravitz et al., 2010). Both populations have been shown to have complementary patterns of activity during movements (Tecuapetla et al., 2016), with MSNs firing at different phases of action initiation and selection. Further dissection of action selection programs reveals that opposing patterns of activation among D2MSNs and D1-MSNs suppress and guide actions, respectively, in the dorsolateral striatum (Cruz et al., 2022). A particular advantage of interval timing is that it captures a cognitive behavior within a single dimension — time. When projected along the temporal dimension, it was surprising that D2-MSNs and D1-MSNs had opposing patterns of activity. Past pharmacological work from our group and others have shown that disrupting D2 or D1 MSNs slows timing (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023), in line with pharmacological and optogenetic results in this manuscript. Computational modeling predicted that disrupting either D2-MSNs or D1-MSNs increased self-reported estimates of time, which was supported by both optogenetic and pharmacological experiments. Notably, these disruptions are distinct from increased timing variability reported with administrations of amphetamine, ventral tegmental area dopamine neuron lesions, and rodent models of neurodegenerative disease (Balci et al., 2008; Gür et al., 2020, 2019; Larson et al., 2022; Weber et al., 2023). Furthermore, our current data demonstrate that disrupting either D2-MSN or D1-MSN activity shifted MSN dynamics and degraded temporal encoding, supporting prior work (De Corte et al., 2019; Drew et al., 2007, 2003; Stutt et al., 2023). Our recording experiments do not identify where a possible response threshold T is instantiated, but downstream basal ganglia structures may have a key role in setting response thresholds (Toda et al., 2017).”

      (13) For Figure 2, what is the error region for G and H? Is there a statistically significant difference between the start (e.g., 0-1 s) and the end (e.g., 5-6 s) time? 

      G and H are standard error, which we have now clarified.

      And on Line 166: 

      “These differences resulted in distinct activity early in the interval (0-1 seconds; F = 6.0, p = 0.02 accounting for variance between mice), but not late in the interval (5-6 seconds; F = 1.9, p = 0.17 accounting for variance between mice) between D2-MSNs and D1-MSNs.”

      Minor: 

      (1)  Figure 2 legend showed the wrong label "Peri-event raster C) from a D2-MSN (red) and E) from a D1-MSN (blue). It should be (D). 

      Fixed, thank you.  

      (2)  Figure 2. Missing legend for (E) and (F).  

      Fixed, thank you.  

      (3)  Line 423: mistyped "\" 

      Fixed, thank you.  

      Reviewer #3 (Recommendations For The Authors): 

      -  To clarify that complementary means opposing in this context, I suggest changing the title. 

      This is a helpful suggestion. We have changed it exactly as the reviewer suggested: 

      “Complementary opposing D2-MSNs and D1-MSNs dynamics during interval timing”

      -  I recommend adding a supplementary figure to demonstrate all the nose pokes in all trials in a given session. The current figures make it hard to assess the specifics of the behavior. For example, what happens if, in a long-interval trial, the mouse pokes in the second nose poke before 6 seconds? Is that behavior punished? Do they keep alternating between the nose poke or do they stick to one nose poke? 

      We agree. We think this is a main point, and we have now redesigned Figure 1 to describe these details: 

      And added these details to the methods (Line 548): 

      “Interval timing switch task. We used a mouse-optimized operant interval timing task described in detail previously (Balci et al., 2008; Bruce et al., 2021; Tosun et al., 2016; Weber et al., 2023). Briefly, mice were trained in sound-attenuating operant chambers, with two front nosepokes flanking either side of a food hopper on the front wall, and a third nosepoke located at the center of the back wall. The chamber was positioned below an 8-kHz, 72-dB speaker (Fig 1A; MedAssociates, St. Albans, VT). Mice were 85% food restricted and motivated with 20 mg sucrose pellets (BioServ, Flemington, NJ). Mice were initially trained to receive rewards during fixed ratio nosepoke response trials. Nosepoke entry and exit were captured by infrared beams. After shaping, mice were trained in the “switch” interval timing task. Mice self-initiated trials at the back nosepoke, after which tone and nosepoke lights were illuminated simultaneously. Cues were identical on all trial types and lasted the entire duration of the trial (6 or 18 seconds). On 50% of trials, mice were rewarded for a nosepoke after 6 seconds at the designated first ‘front’ nosepoke; these trials were not analyzed. On the remaining 50% of trials, mice were rewarded for nosepoking first at the ‘first’ nosepoke location and then switching to the ‘second’ nosepoke location; the reward was delivered for initial nosepokes at the second nosepoke location after 18 seconds when preceded by a nosepoke at the first nosepoke location.  Multiple nosepokes at each nosepokes were allowed. Early responses at the first or second nosepoke were not reinforced. Initial responses at the second nosepoke rather than the first nosepoke, alternating between nosepokes, going back to the first nosepoke after the second nosepoke were rare after initial training. Error trials included trials where animals responded only at the first or second nosepoke and were also not reinforced. We did not analyze error trials as they were often too few to analyze; these were analyzed at length in our prior work (Bruce et al., 2021).”

      -  Figures 2E and 2F suggest that some D1 cells ramp up during the first 6 seconds, while others ramp down. The same is more or less true for D2s. I wonder if the analysis will lose its significance if the two outlier D1s are excluded from Figure 3D. 

      This is a great idea suggested by multiple reviewers. We repeated this analysis with outliers removed. We used a data-driven approach to remove outliers (Line 656): 

      “We performed additional sensitivity analysis excluding outliers outside of 95% confidence intervals and measuring firing rate from the start of the interval to the time of the switch response on a trial-by-trial level for each neuron.”

      And described these data in the results (Line 201): 

      “We found that D2-MSNs and D1-MSNs had a significantly different slope even when excluding outliers (4 outliers excluded outside of 95% confidence intervals; F=7.51, p=0.008 accounting for variance between mice) and when the interval was defined as the time between trial start and the switch response on a trial-by-trial basis for each neuron (F=4.3, p=0.04 accounting for variance between mice).”

      Finally, we removed the outliers the reviewers alluded to – two D1 MSNs – and found similar results (F=6.59, p=0.01 for main effect of D2 vs. D1 MSNs controlling for between-mouse variability). We elected to include the more data driven approach based on 95% confidence intervals.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful manuscript reports mechanisms behind the increase in fecundity in response to sub-lethal doses of pesticides in the crop pest, the brown plant hopper. The authors hypothesize that the pesticide works by inducing the JH titer, which through the JH signaling pathway induces egg development. Evidence for this is, however, inadequate.

      We greatly appreciate your valuable comments and constructive suggestions for our work. All in all, the manuscript has been carefully edited and improved following your suggestions. We also provide more evidence to support our statements by conducting new experiments. First, we found that also EB treatment of adult females can stimulate egg-laying. Second, EB treatment in female adults increases the number of mature eggs in the ovary and ovarioles. Third, EB treatment in females enhances the expression of the kr-h1 gene in the whole body of BPH. Finally, EB treatment in female adults increases the JHIII titer, but has no impact on the 20E titer.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Gao et al. have demonstrated that the pesticide emamectin benzoate (EB) treatment of brown planthopper (BPH) leads to increased egg-laying in the insect, which is a common agricultural pest. The authors hypothesize that EB upregulates JH titer resulting in increased fecundity.

      Strengths:

      The finding that a class of pesticide increases the fecundity of brown planthopper is interesting.

      We greatly appreciate your positive comments on our work.

      Weaknesses:

      (1) EB is an allosteric modulator of GluCl. That means EB physically interacts with GluCl initiating a structural change in the cannel protein. Yet the authors' central hypothesis here is about how EB can upregulate the mRNA of GluCl. I do not know whether there is any evidence that an allosteric modulator can function as a transcriptional activator for the same receptor protein. The basic premise of the paper sounds counterintuitive. This is a structural problem and should be addressed by the authors by giving sufficient evidence about such demonstrated mechanisms before.

      Thank you for your question. As the reviewer points out, EB physically interacts with its target protein GluCl and thus affects its downstream signaling pathway. In the manuscript, we reported that EB-treated brown planthoppers display increased expression of GluCl in the adult stage (Fig. 5A). Actually, there are many studies showing that insects treated with insecticides can increase the expression of target genes. For example, the relative expression level of the ryanodine receptor gene of the rice stem borer, Chilo suppressalis was increased 10-fold after treatment with chlorantraniliprole, an insecticide which targets the ryanodine receptor (Peng et al., 2017). Besides this, in Drosophila, starvation (and low insulin) elevates the transcription level of the sNPF and tachykinin receptors (Ko et al., 2015; Root et al., 2011). In brown planthoppers, reduction in mRNA and protein expression of a nicotinic acetylcholine receptor α8 subunit is associated with resistance to imidacloprid (Zhang et al., 2015). RNA interference knockdown of α8 gene decreased the sensitivity of N. lugens to imidacloprid (Zhang et al., 2015). Hence, expression of receptor genes can be regulated by diverse factors including insecticide treatment. In our case, we found that EB can upregulate its target gene GluCl. However, we did not claim that EB functions as transcriptional activator for GluCl, and we still do not know why EB treatment changes the expression of GluCl in the brown planthopper. Considering our experiments are lasting several days, it might be an indirect (or secondary) effect caused by other factors, which change the expression of GluCl gene upon EB action of the channel. One reason is maybe that the allosteric interaction with GluCl by EB makes it dysfunctional and the cellular response is to upregulate the channel/receptor to compensate. We have inserted text on lines 738 - 757 to explain these possibilities.

      (2) I am surprised to see a 4th instar larval application or treatment with EB results in the upregulation of JH in the adult stages. Complicating the results further is the observation that a 4th instar EB application results in an immediate decrease in JH titer. There is a high possibility that this late JH titer increase is an indirect effect.

      Thank you for your question. Treatment with low doses or sublethal doses of insecticides might have a strong and complex impact on insects (Gandara et al., 2024; Gong et al., 2022; Li et al., 2023; Martelli et al., 2022). We kept the 4th instar of brown planthoppers feeding on EB for four days. They will develop to 5th instar after four days treatment, which is the final nymphal stage of BPH. Since the brown planthopper is a hemimetabolous insect, we cannot rule out the possibility that an indirect effect of treatment with EB results in the upregulation of JH in the adult stages. In this new revised manuscript, we investigated the impact of EB treatment in the adult stage. We found that female adults treated with EB also laid more eggs than controls (Figure 1-figure supplement 1A). The following experiments were performed in adults to address how EB treated stimulates egg-laying in adult brown planthopper.

      (1) We found that EB treatment in adults increases the number of mature eggs in ovary (new Figure 2-figure supplement 1). We add this results in lines 234 – 238 and 281-285.

      (2) We measured the JH titer after the female adults had been treated with EB. We found that EB can also increase the JH titer but has no impact on the 20E titer in the female adult (Figure 3-S3A and B). We add this results in lines 351 – 356 and 281-285.

      (3) EB treatment in adults increases the gene expression of JHAMT and Kr-h1 (Figure 3-S3C and D). We add this results in lines 378 – 379, lines 387-390 and lines 457-462.

      (3) The writing quality of the paper needs improvement. Particularly with respect to describing processes and abbreviations. In several instances the authors have not adequately described the processes they have introduced, thus confusing readers.

      Thank you for your suggestion. We have thoroughly revised the paper to improve clarity.

      (4) In the section 'EB promotes ovarian development' the authors have shown that EB treatment results in increased detention of eggs which contradicts their own results which show that EB promotes egg laying. Again, this is a serious contradiction that nullifies their hypothesis.

      Thank you for pointing this out. We revised the figure 2B to show number of mature eggs in the ovary. The number of mature eggs in ovaries of females that fed on EB was higher than in control females. We also show that BPH fed with EB laid more eggs than controls. Thus, our results suggest that EB promotes ovary maturation (and egg production) and also increases egg laying (Figure 1 and Table S1). Thus, we found that EB treatment can increase both the production of eggs and increase egg laying. We add this results in lines 234 – 238.

      (5) Furthermore, the results suggest that oogenesis is not affected by EB application. The authors should devote a section to discussing how they are observing increased egg numbers in EB-treated insects while not impacting Oogenesis.

      Thank you for your suggestions, and apologies for the lack of clarity in our initial explanation. First, we found that EB treatment led to an increase in the number of eggs laid by female brown planthoppers (Figure 1). Through dissection experiments, we observed that EB-treated females had more mature eggs in their ovaries (Figure 2A and B), indicating that the increased egg-laying was due to a larger production of mature eggs in the ovaries after EB treatment. This is now explained on lines 229-238.

      Additionally, since there is no systematic description of oogenesis in the brown planthopper, we were the first to observe the oogenesis process in this species using immunohistochemistry and laser confocal microscopy. Based on the developmental characteristics, we defined the different stages of oogenesis (Figure 2C, Figure 2-figure supplement 2). We did not observe any significant effect of EB treatment on the various stages of oogenesis, indicating that EB treatment does not impair normal egg development (Figure 2D). Instead, the increase in vitellogenin accelerates the production of mature eggs. This is now explained on lines 243-262.

      During the maturation process, eggs require uptake of vitellogenin, and an increase in vitellogenin (Vg) content can accelerate egg maturation, producing more mature eggs. Our molecular data suggest that EB treatment leads to an upregulation of vg expression. Based on these findings, we conclude that the increase in egg-laying caused by EB treatment is due to the upregulation of vg (Figure 3I), which raises vitellogenin content, promoting the uptake of vitellogenin by maturing eggs and resulting in the production of more mature eggs. We have revised the text on lines 389-395 to clarify this point.

      (6) Met is the receptor of JH and to my understanding, remains mostly constant in terms of its mRNA or protein levels throughout various developmental periods in many different insects. Therefore, the presence of JH becomes the major driving factor for physiological events and not the presence of the receptor Met. Here the authors have demonstrated an increase in Met mRNA as a result of EB treatment. Their central hypothesis is that EB increases JH titer to result in enhanced fecundity. JH action will not result in the activation of Met. Although not contradictory to the hypothesis, the increase in mRNA content of Met is contrary to the findings of the JH field thus far.

      Thank you for your comment. Our results showed that EB treatment can mildly increase (about 2-fold) expression of the Met gene in brown planthoppers (Figure 3G). And our data indicated that Met and FAMeT expression levels were not influenced so much by EB compared with kr-h1 and vg (Figure 3H and I). We agree that JH action will not result in the increase of Met. However, we cannot rule out the possibility of other factors (indirect effects), induced by EB treatment that increase the mRNA expression level of Met. One recent paper reported that downregulation of transcription factor CncC will increase met expression in beetles (see Figure 6A in this reference) (Jiang et al., 2023). Many studies have reported that insecticide treatment will activate the CncC gene signaling pathway, which regulates detoxification gene expression (Amezian et al., 2023; Fu et al., 2024; Hu et al., 2021). Hence, it is possible that EB might influence the CncC gene pathway which then induces met expression. This EB effect on met upregulation may be similar to the upregulation of GluCl and some other secondary effects. We have discussed this on lines 725-738.

      (7) As pointed out before, it is hard to rationalize how a 4th instar exposure to EB can result in the upregulation of key genes involved in JH synthesis at the adult stage. The authors must consider providing a plausible explanation and discussion in this regard.

      Thank you for your comments. It must be mentioned that although we exposed the BPH to EB at 4th instar, we make the insect feed on the EB-treated rice plants for four days. After that, the insect will develop into 5<sup>th</sup> instar, the final nymphal stage of brown planthopper. Since brown planthoppers do not have a pupal stage, this might cause the EB presented to the insects last a longer time even in the adult stage. Besides this, we found that EB treatment will increase the weight of adult females (Figure 1-figure supplement 3E and F), which indicates that EB might increase food intake in BPHs that might produce more insulin peptide. Insulin might increase the JH synthesis at the adult stage. In our revised study we also investigate EB impairment in adult BPHs. We found that, similar to the nymphal stage, EB treatment in adult BPHs also increases the egg laying. Furthermore, the JH titer was increased after treatment of BPH with EB in adults. Besides this, GluCl and kr-h1 genes were also up-regulated after EB treatment in the adult stage. We have discussed this on lines 739-746.

      (8) I have strong reservations against such an irrational hypothesis that Met (the receptor for JH) and JH-Met target gene Kr-h1 regulate JH titer (Line 311, Fig 3 supplemental 2D). This would be the first report of such an event on the JH field and therefore must be analysed in depth. I strongly suggest the authors remove such claims from the manuscript without substantiating it.

      Thank you for your suggestions and comments. We have changed our claims in this revised MS. We found that EB treatment can enhance Kr-h1 expression. We have no evidence to support that JH can induce met expression. We have rewritten the manuscript to avoid confusion (see text on lines 725-735).

      (9) Kr-h1 is JH/Met target gene. The authors demonstrate that silencing of Kr-h1 results in inhibition of FAMeT, which is a gene involved in JH synthesis. A feedback loop in JH synthesis is unreported. It is the view of this reviewer that the authors must go ahead with a mechanistic detail of Kr-h1 mediated JH upregulation before this can be concluded. Mere qPCR experiments are not sufficient to substantiate a claim that is completely contrary to the current understanding of the JH signalling pathway.

      Thank you for your suggestions and comments. We agree that only qPCR experiments are not enough to provide this kind of claim. More evidences need to be provided to support this. We have revised the MS to avoid confusion (see text on lines 725-735).

      (10) The authors have performed knockdowns of JHAMT, Met, and Kr-h1 to demonstrate the effect of these factors on fecundity in BPH. Additionally, they have performed rescue experiments with EB application on these knockdown insects (Figure 3K-M). This, I believe, is a very flawed experiment. The authors demonstrate EB works through JHAMT in upregulating JH titer. In the absence of JHAMT, EB application is not expected to rescue the phenotype. But the authors have reported a complete rescue here. In the absence of Met, the receptor of JH, either EB or JH is not expected to rescue the phenotype. But a complete rescue has been reported. These two experimental results contradict their own hypothesis.

      Thank you for your comments. We thought that this rescue is possible since knockdown of the genes is incomplete when using dsRNA injection (and residual gene expression allows for EB action). It is not a total knockout and actually, these genes still have a low level of expression in the dsRNA-injected insects. Since EB can upregulate the expression of JHAMT, Met, and Kr-h1, it is reasonable that EB treatment can rescue the down-regulation effects of these three genes and make fecundity completely rescued. We have clarified this on lines 411-413).

      (11) A significant section of the paper deals with how EB upregulates JH titer. JH is a hormone synthesized in the Corpora Allata. Yet the authors have chosen to use the whole body for all of their experiment. Changes in the whole body for mRNA of those enzymes involved in JH synthesis may not reflect the situation in Corpora Allata. Although working with Corpora Allata is challenging, discarding the abdomen and thorax region and working with the head and neck region of the insect is easily doable. Results from such sampling are always more convincing when it comes to JH synthesis studies.

      Thank you for your suggestions. Because the head is very difficult to separate from the thorax region in brown planthoppers as you can see in Author response image 1. We are now trying to answer how EB regulates JH synthesis using Drosophila as a model.

      Author response image 1.

      The brown planthopper

      (12) The phenomenon reported was specific to BPH and not found in other insects. This limits the implications of the study.

      Thank you for your comments. The brown planthopper is a serious insect pest on rice in Asia. Our findings can guide the use of this insecticide in the field. Besides this, our findings indicated that EB, which targets GluCl can impair the JH titer. Our findings added new implications for how a neuronal system influences the JH signaling pathway. We will further investigate how EB influences JH in the future and will use Drosophila as a model to study the molecular mechanisms.

      (13) Overall, the molecular experiments are very poorly designed and can at best be termed superficial. There are several contradictions within the paper and no discussion or explanation has been provided for that.

      Thank you for your comments. We have revised the paper according to your suggestions and added further explanation of our results in the discussion parts and hope the conclusions are better supported in the new version. We have discussed this on lines 725-746 and 778-799.

      Reviewer #2 (Public Review):

      The brown plant hopper (BPH) is a notorious crop pest and pesticides are the most widespread means of controlling its population. This manuscript shows that in response to sublethal doses of the pesticide (EB), BPH females show enhanced fecundity. This is in keeping with field reports of population resurgence post-pesticide treatment. The authors work out the mechanism behind this increase in fecundity. They show that in response to EB exposure, the expression of its target receptor, GluCl, increases. This, they show, results in an increase in the expression of genes that regulate the synthesis of juvenile hormone (JH) and JH itself, which, in turn, results in enhanced egg-production and egg-laying. Interestingly, these effects of EB exposure are species-specific, as the authors report that other species of plant hoppers either don't show enhanced fecundity or show reduced fecundity. As the authors point out, it is unclear how an increase in GluCl levels could result in increased JH regulatory genes.

      We greatly appreciate your valuable comments and constructive suggestion to our work. We will try to figure out how EB interacts with its molecular target GluCl and then increases JH regulatory genes in the future work using Drosophila as models.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall, the molecular experiments are very poorly designed and can at best be termed superficial. There are several contradictions within the paper and no discussion or explanation has been provided for that.

      The authors should consider a thorough revision.

      Thank you for your comments. We have thoroughly revised the paper according to your suggestions and added further experiments and explanations of our results in the discussion parts.

      Reviewer #2 (Recommendations For The Authors):

      It would help the reader to have more schematics along with the figures. The final figure is helpful, but knowing the JH pathway, and where it acts would help with the interpretations as one reads the manuscript and the figures. The pathways represented in 4N or 5J are helpful but could be improved upon for better presentation.

      It would be nice to have some discussion on how the authors think EB exposure results in an increase in GluCl expression, and how that in turn affects the expression of so many genes.

      Thank you for your comments. We have thoroughly revised the paper according to your suggestions and added further experiments and explanations of how we think EB exposure results in an increase in JH titer and other genes in the discussion parts. We have added the test on lines 753-761.

      References

      Amezian, D., Fricaux, T., de Sousa, G., Maiwald, F., Huditz, H.-I., Nauen, R., Le Goff, G., 2023. Investigating the role of the ROS/CncC signaling pathway in the response to xenobiotics in Spodoptera frugiperda using Sf9 cells. Pesticide Biochemistry and Physiology 195, 105563.

      Fu, B., Liang, J., Hu, J., Du, T., Tan, Q., He, C., Wei, X., Gong, P., Yang, J., Liu, S., Huang, M., Gui, L., Liu, K., Zhou, X., Nauen, R., Bass, C., Yang, X., Zhang, Y., 2024. GPCR–MAPK signaling pathways underpin fitness trade-offs in whitefly. Proceedings of the National Academy of Sciences 121, e2402407121.

      Gandara, L., Jacoby, R., Laurent, F., Spatuzzi, M., Vlachopoulos, N., Borst, N.O., Ekmen, G., Potel, C.M., Garrido-Rodriguez, M., Böhmert, A.L., Misunou, N., Bartmanski, B.J., Li, X.C., Kutra, D., Hériché, J.-K., Tischer, C., Zimmermann-Kogadeeva, M., Ingham, V.A., Savitski, M.M., Masson, J.-B., Zimmermann, M., Crocker, J., 2024. Pervasive sublethal effects of agrochemicals on insects at environmentally relevant concentrations. Science 386, 446-453.

      Gong, Y., Cheng, S., Desneux, N., Gao, X., Xiu, X., Wang, F., Hou, M., 2022. Transgenerational hormesis effects of nitenpyram on fitness and insecticide tolerance/resistance of Nilaparvata lugens. Journal of Pest Science.

      Hu, B., Huang, H., Hu, S., Ren, M., Wei, Q., Tian, X., Esmail Abdalla Elzaki, M., Bass, C., Su, J., Reddy Palli, S., 2021. Changes in both trans- and cis-regulatory elements mediate insecticide resistance in a lepidopteron pest, Spodoptera exigua. PLOS Genetics 17, e1009403.

      Jiang, H., Meng, X., Zhang, N., Ge, H., Wei, J., Qian, K., Zheng, Y., Park, Y., Reddy Palli, S., Wang, J., 2023. The pleiotropic AMPK–CncC signaling pathway regulates the trade-off between detoxification and reproduction. Proceedings of the National Academy of Sciences 120, e2214038120.

      Ko, K.I., Root, C.M., Lindsay, S.A., Zaninovich, O.A., Shepherd, A.K., Wasserman, S.A., Kim, S.M., Wang, J.W., 2015. Starvation promotes concerted modulation of appetitive olfactory behavior via parallel neuromodulatory circuits. eLife 4, e08298.

      Li, Z., Wang, Y., Qin, Q., Chen, L., Dang, X., Ma, Z., Zhou, Z., 2023. Imidacloprid disrupts larval molting regulation and nutrient energy metabolism, causing developmental delay in honey bee Apis mellifera. eLife

      Martelli, F., Hernandes, N.H., Zuo, Z., Wang, J., Wong, C.-O., Karagas, N.E., Roessner, U., Rupasinghe, T., Robin, C., Venkatachalam, K., Perry, T., Batterham, P., Bellen, H.J., 2022. Low doses of the organic insecticide spinosad trigger lysosomal defects, elevated ROS, lipid dysregulation, and neurodegeneration in flies. eLife 11, e73812.

      Peng, Y.C., Sheng, C.W., Casida, J.E., Zhao, C.Q., Han, Z.J., 2017. Ryanodine receptor genes of the rice stem borer, Chilo suppressalis: Molecular cloning, alternative splicing and expression profiling. Pestic. Biochem. Physiol. 135, 69-77.

      Root, Cory M., Ko, Kang I., Jafari, A., Wang, Jing W., 2011. Presynaptic facilitation by neuropeptide signaling mediates odor-driven food search. Cell 145, 133-144.

      Zhang, Y., Wang, X., Yang, B., Hu, Y., Huang, L., Bass, C., Liu, Z., 2015. Reduction in mRNA and protein expression of a nicotinic acetylcholine receptor α8 subunit is associated with resistance to imidacloprid in the brown planthopper, Nilaparvata lugens. Journal of Neurochemistry 135, 686-694.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors bring together implanted radiofrequency coils, high-field MRI imaging, awake animal imaging, and sensory stimulation methods in a technological demonstration. The results are very detailed descriptions of the sensory systems under investigation.

      Strengths:

      - The maps are qualitatively excellent for rodent whole-brain imaging. - The design of the holder and the coil is pretty clever.

      Weaknesses:

      - Some unexpected regions appear on the whole brain maps, and the discussion of these regions is succinct.

      - The authors do not make the work and e ort to train the animals and average the data from several hundred trials apparent enough. This is important for any reader who would like to consider implementing this technology.

      - The data is not available. This does not let the readers make their own assessment of the results.

      Thank you for the comments on this manuscript. We have provided more detailed discussion of the unexpected regions(page 18 – line 491-494) and training procedures(page7-9 – line 172-236). We also uploaded the datasets to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts:  (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Hike et al. entitled 'High-resolution awake mouse fMRI at 14 Tesla' describes the implementation of awake mouse BOLD-fMRI at high field. This work is timely as the field of mouse fMRI is working toward collecting high-quality data from awake animals. Imaging awake subjects o ers opportunities to study brain function that are otherwise not possible under the more common anesthetized conditions. Not to mention the confounding e  ects that anesthesia has on neurovascular coupling. What has made progress in this area slow (relative to other imaging approaches like optical imaging) is the environment within the MRI scanner (high acoustic noise) - as well as the intolerance of head and body motion. This work adds to a relatively small, but quickly growing literature on awake mouse fMRI. The findings in the study include testing of an implanted head-coil (for MRI data reception). Two designs are described and the SNR of these units at 9.4T and 14T are reported. Further, responses to visual as well as whisker stimulation recorded in acclimated awake mice are shown. The most interesting finding, and most novel, is the observation that mice seem to learn to anticipate the presentation of the stimulus - as demonstrated by activations evident ~6 seconds prior to the presentation of the stimulus when stimuli are delivered at regular intervals (but not when stimuli are presented at random intervals). These kinds of studies are very challenging to do. The surgical preparation and length of time invested into training animals are grueling. I also see this work as a step in the right direction and evidence of the foundations for lots of interesting future work. However, I also found a few shortcomings listed below.

      Weaknesses:

      (1) The surface coil, although o ering a great SNR boost at the surface, ultimately comes at a cost of lower SNR in deeper more removed brain regions in comparison to commercially available Bruker coils (at room temperature). This should be quantified. A rough comparison in SNR is drawn between the implanted coils and the Bruker Cryoprobe - this should be a quantitative comparison (if possible) - including any di erences in SNR in deeper brain structures. There are drawbacks to the Cryoprobe, which can be discussed, but a more thorough comparison between the implanted coils, and other existing options should be provided (the Cryoprobe has been used previously in awake mouse experiments(Sensory evoked fMRI paradigms in awake mice - Chen, Physiological e ects of a habituation procedure for functional MRI in awake mice using a cryogenic radiofrequency probe – Yoshida, PREVIOUS REFERENCE). Further, the details of how to build the implanted coils should be provided (shared) - this should include a parts list as well as detailed instructions on how to build the units. Also, how expensive are they? And can they be reused?

      Thank you for the comment. We did not use a Bruker Cryoprobe for this work but rather a Bruker 4array surface coil. We are unable to compare to a cryoprobe since we do not have access to one for our system. A comparison to previously published data using different scanners could be possible but would require the sequence contain identical parameters to avoid introducing an uncontrollable variable, we are planning to recruit different laboratories to test the implanted RF coils with their existing cryoprobes in the future study. 

      We have included an updated figure comparing SNR at different depths across the Bruker 4-array coil and the implanted RF coils. As shown in Supplementary Figure 7B, there is significant SNR enhancement up to 4 mm cortical depth for both single loop and Figure 8 implanted RF coils in comparison to the Bruker 4-array coil.

      Author response image 1.

      Comparison between implanted and commercial coils. A shows representative coils in the single loop (left) and figure 8 styles (right). Supplementary Table 1 provides a parts list and cost for making these coils and Supplementary Figure 1 provides a circuit diagram to assemble. B presents the SNR line profile values as a function of distance from Pia Matter for each coil tested at 9.4T: commercial phased array surface coil (4 Array), implanted single loop, and implanted figure 8. SNR values were calculated by dividing the signal by the standard deviation of the noise. C-E shows a representative FLASH image with line profile of SNR measurements from each of the coils used to create the graph seen in B. Clear visual improvement in SNR can be seen in figures C-E. C – Commercial phased array. D – Single loop at 9.4T. E – Figure 8 at 9.4T. (N4 array = 6, Nsingle loop = 5, Nfigure 8 = 5)

      Additionally, we have added a supplementary figure (supp fig 1) of a circuit diagram, in an effort to disseminate the prototype design of the coils to other laboratories. We have included a detailed parts list with the cost for construction of the coils configured for our scanner(supp table 1). These specifics though would need to be adjusted to the precise field strength/bore size/animal the coil was being built for. As for reusability, the copper wire is cemented to the animal skull and this implantable coil should be considered as consumables for the awake mouse experiments, though the PCB parts can be retrieved.  

      (2) In the introduction, the authors state that "Awake mouse fMRI has been well investigated". I disagree with this statement and others in the manuscript that gives the reader the impression that awake experiments are not a challenging and unresolved approach to fMRI experiments in mice (or rodents). Although there are multiple labs (maybe 15 worldwide) that have conducted awake mouse experiments (with varying degrees of success/thoroughness), we are far from a standardized approach. This is a strength of the current work and should be highlighted as such. I encourage the authors to read the recent systematic review that was published on this topic in Cerebral Cortex by Mandino et al. There are several elements in there that should influence the tone of this piece including awake mouse implementations with the Bruker Cryoprobe, prevalence of surgical preparations, and evaluations of stress.

      Thank you for the comment. We agree with the reviewer that the current stage of awake mouse fMRI studies remains to be improved.  And, we have revised the Introduction to highlight the state-of-theart of awake mouse fMRI (Page 4 – line 81-88). 

      (3) The authors also comment on implanted coils reducing animal stress - I don't know where this comment is coming from, as this has not been reported in the literature (to my knowledge) and the authors don't appear to have evaluated stress in their mice. 

      Since question 3 and 4 are highly related to the acclimation procedures, we will answer the two questions together.   

      (4) Following on the above point, measures of motion, stress, and more details on the acclimation procedure that was implemented in this study should be included.

      We thank the reviewer to raise the animal training issues.  

      During the animal training, we have measured both pupil dynamic and eye motion features from training sessions, of which the detailed procedure is described in Methods (page 7-9 – line 172236). 

      The training procedure is carried out over a total of 5 weeks with four phases of training: i. Holding animal in hands, ii. Head-fixation and pupillometry, iii. Head-fixation and pupillometry with mockMRI acoustic exposure, iv. Head-fixation and pupillometry with Echo-Planar-Imaging (EPI) in the MR scanner.

      Author response table 1.

      As shown in Supp Fig 2B, the spectral power of pupil dynamics (<0.02Hz) and eye movements gradually increased as a function of the training time for head-fixed mice exposed to the mock MRI acoustic environment during phase 3.  In phase 4, when head-fixed mice were put into the scanner for the first time, both eye movements and pupil dynamics were initially reduced during scanning but recovered to an acclimated state on Day 2, similar to the level on Day 8 of phase 3.  These behavioral outputs would provide an alternative way to monitor the stress levels of the mice. 

      Author response image 2.

      The eye movements (A) and power spectra of pupil dynamics (<0.02Hz) (B) change during different training phases.

      It should be noted that stress may be related to increased frequency of eye blinking or twitching movements in human subjects(1–3). Whereas, the eyeblink of head-fixed mice has been used for behavioral conditioning to investigate motor learning in normal behaving mice(4–6). Importantly, head-fixed mouse studies have shown that eye movements are significantly reduced compared to the free-moving mice(7). The increased eye movement during acclimation process would indicate an alleviated stress level of the head-fixed mice in our cases. Meanwhile, stress-related pupillary dilation could dominate the pupil dynamics at the early phase of training(8). We have observed a gradually increased pupil dynamic power spectrum at the ultra-slow frequency during phase 3, presenting the alleviated stress-related pupil dilation but recovered pupil dynamics to other factors, including arousal, locomotion, startles, etc. in normal behaving mice.  Despite the extensive training procedure of the present work in comparison to the existing awake mouse fMRI studies (training strategies for awake mice fMRI have been reviewed by Mandino et al. to show the overall training duration of existing studies(9)), the stress remains a confounding factor for the brain functional mapping in head-fixed mice. In particular, a recent study(10) shows that the corticosterone concentration in the blood samples of head-fixed mice is significantly reduced on Day 25 following the training but remains higher than in the control mice. In the discussion section, we have discussed the potential issues of stress-related confounding factors for awake mouse fMRI studies (Page 16 – lines 436-458). 

      (1) A. Marcos-Ramiro, D. Pizarro-Perez, M. Marron-Romera, D. Gatica-Perez, Automatic blinking detection towards stress discovery. ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction 307–310 (2014). https://doi.org/10.1145/2663204.2663239/SUPPL_FILE/ICMI1520.MP4.

      (2) M. Haak, S. Bos, S. Panic, L. Rothkrantz, DETECTING STRESS USING EYE BLINKS AND BRAIN ACTIVITY FROM EEG SIGNALS. Lance 21, 76 (2009).

      (3) E. Del Carretto Di Ponti E Sessam, Exploring the impact of Stress and Cognitive Workload on Eye Movements: A Preliminary Study. (2023).

      (4) S. A. Heiney, M. P. Wohl, S. N. Chettih, L. I. Ru olo, J. F. Medina, Cerebellar-dependent expression of motor learning during eyeblink conditioning in head-fixed mice. J Neurosci 34, 14845–14853 (2014).

      (5) S. N. Chettih, S. D. Mcdougle, L. I. Ruffolo, J. F. Medina, Adaptive timing of motor output in the mouse: The role of movement oscillations in eyelid conditioning. Front Integr Neurosci 5, 12996 (2011).

      (6) J. J. Siegel, et al., Trace Eyeblink Conditioning in Mice Is Dependent upon the Dorsal Medial Prefrontal Cortex, Cerebellum, and Amygdala: Behavioral Characterization and Functional Circuitry. eNeuro 2, 51–65 (2015).

      (7) A. F. Meyer, J. O’Keefe, J. Poort, Two Distinct Types of Eye-Head Coupling in Freely Moving Mice. Current Biology 30, 2116-2130.e6 (2020).

      (8) H. Zeng, Y. Jiang, S. Beer-Hammer, X. Yu, Awake Mouse fMRI and Pupillary Recordings in the UltraHigh Magnetic Field. Front Neurosci 16, 886709 (2022).

      (9) F. Mandino, S. Vujic, J. Grandjean, E. M. R. Lake, Where do we stand on fMRI in awake mice? Cereb Cortex 34 (2024).

      (10) K. Juczewski, J. A. Koussa, A. J. Kesner, J. O. Lee, D. M. Lovinger, Stress and behavioral correlates in the head-fixed method: stress measurements, habituation dynamics, locomotion, and motor-skill learning in mice. Scientific Reports 2020 10:1 10, 1–19 (2020).

      (5) It wasn't clear to me at what times the loop versus "Figure 8" coil was being used, nor how many mice (or how much data) were included in each experiment/plot. There is also no mention of biological sex.

      Thank you for the comment. We have clarified sex and number. The figure 8 coil was only used as part of development to show the improvement of the coil design for cortical measurements. The detailed information is described in Method (Page 6 – line 127-129 & Page 10 – line 269-270). Additionally animal numbers have been included in the figure captions.

      (6) Building on the points above, the manuscript overall lacks experimental detail (especially since the format has the results prior to the methods).

      Thank you for the comment. We have modified the manuscript to increase the experimental detail and moved the methods section before the results.

      (7) An observation is made in the manuscript that there is an appreciable amount of negative BOLD signal. The authors speculate that this may come from astrocyte-mediated BOLD during brain state changes (and cite anesthetized rat and non-human primate experiments). This is very strange to me. First, the negative BOLD signal is not plotted (please do this), further, there are studies in awake mice that measure astrocyte activation eliciting positive BOLD responses (see Takata et al. in Glia, 2017).

      We thank the reviewer to raise the negative BOLD fMRI observation issue.  We added a subplot of the negative BOLD signal changes in the revised Figure 4. This negative BOLD signals across cortical areas could be coupled with brain state changes upon air-pu -induced startle responses. Our future studies are focusing on elucidating the brain-wide activity changes of awake mice with fMRI.  We also provide a detailed discussion of the potential mechanism underlying the negative BOLD fMRI signals. First, as reported in the paper (suggested  by the reviewer),  astrocytic Ca2+ transients coincide with positive BOLD responses in the activated cortical areas, which is aligning with the neurovascular coupling (NVC) mechanism. However, there is emerging evidence to show that astrocytic Ca2+ transients are coupled with both positive and negative BOLD responses in anesthetized rats(11) and awake mice(12). An intriguing observation is that cortex-wide negative BOLD signals coupled with the spontaneous astrocytic Ca2+ transients could co-exist with the positive BOLD signal detected at the activated cortex.  Studies have shown that astrocytes are involved in regulating brain state changes(13), in particular, during locomotion(14) and startle responses(15). These brain state-dependent global negative BOLD responses are also related to the arousal changes of both non-human primates(16) and human subjects(17).  The established awake mouse fMRI platform with ultra-high spatial resolution will enable the brain-wide activity mapping of the functional nuclei contributing to the brain state changes of head-fixed awake mice in future studies. (Page 17-18 – Line 478-490)

      (11) M. Wang, Y. He, T. J. Sejnowski, X. Yu, Brain-state dependent astrocytic Ca2+ signals are coupled to both positive and negative BOLD-fMRI signals. Proc Natl Acad Sci U S A 115, E1647–E1656 (2018).

      (12) C. Tong, Y. Zou, Y. Xia, W. Li, Z. Liang, Astrocytic calcium signal bidirectionally regulated BOLD-fMRI signals in awake mice in Proc. Intl. Soc. Mag. Reson. Med. 32, (2024).

      (13) K. E. Poskanzer, R. Yuste, Astrocytes regulate cortical state switching in vivo. Proc Natl Acad Sci U S A 113, E2675–E2684 (2016).

      (14) M. Paukert, et al., Norepinephrine controls astroglial responsiveness to local circuit activity. Neuron 82, 1263–1270 (2014).

      (15) R. Srinivasan, et al., Ca2+ signaling in astrocytes from IP3R2−/− mice in brain slices and during startle responses in vivo. Nat Neurosci 18, 708 (2015).

      (16) C. Chang, et al., Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113, 4518– 4523 (2016).

      (17) B. Setzer, et al., A temporal sequence of thalamic activity unfolds at transitions in behavioral arousal state. Nat Commun 13 (2022).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this work. The maps shown are among the best-quality maps out there. Here are suggestions to the authors.

      (1) Both the ACA and VRA are rather unexpected. The authors explain these briefly as being part of the associative cortical areas. Both the ACA and VRA are not canonical associative areas (or at least not to us). This warrants a stronger discussion.

      To verify both ACA and VRA as associate areas, we provide the  connectivity map projections from the Allen Brain Atlas (seen below). These projections are derived from a Cre-dependent AAV tracing of axonal projections. We have included an explanation of this in the introduction. 

      Author response image 3.

      Representative images are shown indicating connections between the barrel cortex and retrosplenial area from an injection in the barrel cortex (Left panel) as well as the visual cortex and cingulate connection from an injection in the visual cortex (Right panel). Images are of connectivity map projections from the Allen Brain Atlas derived from a Cre-dependent AAV tracing of axonal projections

      (2) This is a lot of work. But looking at the figures, this is not obvious. We read in the caption that several hundred trials were used. It would be good to also specify how many mice. It would be clearer to represent this info in the figure as well to support the fact that this is not a trivial acquisition.

      Thank the reviewer to raise the e ort issue. We have edited the figure to include this information and included the numbers in the text as well

      (3) The training protocol is seemingly extensive, but this is only visible by following another reference. Including a description in this work would help the reader make sense of the effort that went into this work.

      We thank the reviewer to raise the training protocol issue. We have more thoroughly discussed the training method used for this study (page 7-9 – line 172-236)

      (4) I really would love to see that dataset made freely available - this should be the norm.

      The datasets have been uploaded to OpenNeuro 

      Whisker (https://doi.org/10.18112/openneuro.ds005496.v1.0.1),  Visual (https://doi.org/10.18112/openneuro.ds005497.v1.0.0) and Zenodo:

      SNR Line Profile Data & Data Processing Scripts: 

      (https://zenodo.org/doi/10.5281/zenodo.13821455). 

      (page 21 – line 573-579)

      Reviewer #2 (Recommendations For The Authors):

      (1) I'm a little confused about the stimulation paradigm and the effect of it causing an effective 2second TR (which is on the long side) - please elaborate (a figure might be helpful). The paradigm for visual stimulation also seems elaborate, can you please explain the logic and how it was developed?

      Thank you for raising the detailed stimulation paradigm issues. The stimulation paradigm is independent and does not interfere with the setup of the effective 2-second TR. The 2-second TR is based on the usage of 2-segment EPI, each with a TR of 1-second. The application of 2-segment paradigm enables the echo spacing with 0.52 ms with effective image bandwidth with 3858Hz, assuring less image distortion.  The stimulation paradigm was defined by an “8s on, 32s o ” epoch such to elicit a strong BOLD response and could be used for any reasonable TR duration. 

      We have included a figure outlining the stimulation paradigm (Supp Fig. 3)

      (2) I had difficulties viewing the movies (on my MAC).

      Thank you for this note. We have re-upload the videos in .mov format

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors engineer the endogenous left boundary of the Drosophila eve TAD, replacing the endogenous Nhomie boundary by either a neutral DNA, a wildtype Nhomie boundary, an inverted Nhomie boundary, or a second copy of the Homie boundary. They perform Micro-C on young embryos and conclude that endogenous Nhomie and Homie boundaries flanking eve pair with head-to-tail directionality to form a chromosomal stem loop. Abrogating the Nhomie boundary leads to ectopic activation of genes in the former neighboring TAD by eve embryonic stripe enhancers. Replacing Nhomie by an inverted version or by Homie (which pairs with itself head-to-head) transformed the stem loop into a circle loop. An important finding was that stem and circle loops differentially impact endogenous gene regulation both within the eve TAD and in the TADs bracketing eve. Intriguingly, an eve TAD with a circle loop configuration leads to ectopic activation of flanking genes by eve enhancers - indicating compromised regulatory boundary activity despite the presence of an eve TAD with intact left and right boundaries.

      Strengths:

      Overall, the results obtained are of high-quality and are meticulously discussed. This work advances our fundamental understanding of how 3D genome topologies affect enhancer-promoter communication.

      Weaknesses:

      Though convincingly demonstrated at eve, the generalizability of TAD formation by directional boundary pairing remains unclear, though the authors propose this mechanism could underly the formation of all TADs in Drosophila and possibly even in mammals. Strong and ample evidence has been obtained to date that cohesin-mediated chromosomal loop extrusion explains the formation of a large fraction of TADs in mammals. 

      (1.1) The difficultly with most all of the studies on mammal TADs, cohesin and CTCF roadblocks is that the sequencing depth is not sufficient, and large bin sizes (>1 kb) are needed to visualize chromosome architecture.  The resulting contact profiles show TAD neighborhoods, not actual TADs.

      The problem with these studies is illustrated by comparing the contact profiles of mammalian MicroC data sets at different bin sizes in Author response image 1.  In this figure, the darkness of the “pixels” in panels E, F, G and H was enhanced by reducing brightness in photoshop.

      Author response image 1.

      Mammalian MicroC profiles different bun sizes

      Panels A and C show “TADs” using bin sizes typical of most mammalian studies (see Krietenstein et al. (2023) (Krietenstein et al. 2020)).  At this level of resolution, TADs, the “trees” that are the building blocks of chromosomes, are not visible.  Instead, what is seen are TAD neighborhoods or “forests”.  Each neighborhood consists of several dozen individual TADs.  The large bins in these panels also artificially accentuated TAD:TAD interactions, generating a series of “stripes” and “dots” that correspond to TADs bumping into each other and sequences getting crosslinked.  For example, in panel A there is prominent stripe on the edge of a “TAD” (blue arrow).  In panel C, this stripe resolves into a series of dots arranged as parallel, but interrupted “stripes” (green and blue arrows).  At the next level of resolution, it can be seen that the stripe marked by the blue arrow and magenta asterisk is generated by contacts between the left boundary of the TAD indicated by the magenta bar with sequences in a TAD (blue bar) ~180 kb way.  While dots and stripes are prominent features in contact profiles visualized with larger bin sizes (A and C), the actual TADs that are observed with a bin size of 200 bp (examples are underlined by black bars in panel G) are not bordered by stripes, nor are they topped by obvious dots.  The one possible exception is the dot that appears at the top of the volcano triangle underlined with magenta.

      The chromosome 1 DNA segment from the MicroC data of Hseih et al. (2023) (Hsieh et al. 2020) shows a putative volcano triangle with a plume (indicated by a V in Author response image 1 panels D, F and H).  Sequences in the V TAD don’t crosslink with their immediate neighbors, and this gives a “plume” above the volcano triangle, as indicate by the light blue asterisk in panels D, F and H.  Interestingly the V TAD does contact two distant TADs, U on the left and W on the right. The U TAD is ~550 kb from V, and the region of contact is indicated by the black arrow.  The W TAD is ~585 kb from V, and the region of contact is indicated by the magenta arrow.  While the plume still seems to be visible with a bin size of 400 bp (light blue asterisk), it is hard to discern when the bin size is 200 bp, as there are not enough reads.

      The evidence demonstrating that cohesin is required for TAD formation/maintenance is based on low resolution Hi-C data, and the effects that are observed are on TAD neighborhoods (forests) and not TADs (trees).  In fact, there is published evidence that cohesin is not required in mammals for TAD formation/maintenance.  In an experiment from Goel et al. 2023 the authors depleted the cohesin component Rad21 and then visualized the effects on TAD organization using the high resolution region capture MicroC (RCMC) protocol.  The MicroC contact map in this figure visualizes a ~250 kb DNA segment around the Ppm1pg locus at 250 bp resolution.  On the right side of the diagonal is the untreated control, while the left side shows the MicroC profile of the same region after Rad21 depletion.  The authors indicated that there was a 97% depletion of Rad21 in their experiment.  However, as is evident from a comparison of the experimental and control, loss of Rad21 has no apparent effect on the TAD organization of this mammalian DNA segment.

      Several other features are worth noting.  First, unlike the MicroC experiments shown in Author response image 1, there are dots at the apex of the TADs in this chromosomal segment.  In the MicroC protocol, fixed chromatin is digested to mononucleosomes by extensive MNase digestion.  The resulting DNA fragments are then ligated, and dinucleosome-length fragments are isolated and sequenced. 

      DNA sequences that are nucleosome free in chromatin (which would be promoters, enhancers, silencers and boundary elements) are typically digested to oligonucleotides in this procedure and won’t be recovered. This means that the dots shown here must correspond to mononucleosome-length elements that are MNase resistant.  This is also true for the dots in the MicroC contact profiles of the Drosophila Abd-B regulatory domain (see Fig. 2B in the paper).  Second, the TADs are connected to each other by 45o stripes (see blue and green arrowheads).  While it is not clear from this experiment whether the stipes are generated by an active mechanism (enzyme) or by some “passive” mechanism (e.g., sliding), the stripes in this chromosomal segment are not generated by cohesin, as they are unperturbed by Rad21 depletion.  Third, there are no volcano triangles with plumes in this chromosomal DNA segment.  Instead, the contact patterns (purple and green asterisks) between neighboring TADs closely resemble those seen for the Abd-B regulatory domains (compare Goel et al. 2023 with Fig. 2B in the paper).  This similarity suggests that the TADs in and around Ppm1g may be circle-loops, not stem-loops.  As volcano triangles with plumes also seem to be rare in the MicroC data sets of Krietenstein et al. (Krietenstein et al. 2020) and Hesih et al. (Hsieh et al. 2020) (with the caveat that these data sets are low resolution: see Author response image 1), it is possible that much of the mammalian genome is assembled into circle-loop TADs, a topology that can’t be generated by the cohesin loop extrusion (bolo tie clip) /CTCF roadblock model.

      While Rad21 depletion has no apparent effect on TADs, it does appear to impact TAD neighborhoods.  This is in a supplemental figure in Goel et al. (Goel et al. 2023).  In this figure, TADs in the Ppm1g region of chromosome 5 are visualized with bin sizes of 5 kb and 1 kb.  A 1.2 Mb DNA segment is shown for the 5 kb bin size, while an 800 kb DNA segment is shown for the 1 kb bin size.  As can be seen from comparing the MicroC profiles in Author response image 2 with that in Goel et al. 2023, individual TADs are not visible.  Instead, the individual TADs are binned into large TAD “neighborhoods” that consist of several dozen or more TADs.

      Unlike the individual TADs shown in Goel et al. 2023, the TAD neighborhoods in Author response image 2 are sensitive to Rad21 depletion.  The effects of Rad21 depletion can be seen by comparing the relative pixel density inside the blue lines before (above the diagonal) and after (below the diagonal) auxin-induced Rad21 degradation.  The reduction in pixel density is greatest for more distant TAD:TAD contacts (farthest from the diagonal).  By contrast, the TADs themselves are unaffected (Goel et al. 2023), as are contacts between individual TADs and their immediate neighbors.  In addition, contacts between partially overlapping TAD neighborhoods are also lost.  At this point it isn’t clear why contacts between distant TADs in the same neighborhood are lost when Rad21 is depleted; however, a plausible speculation is that it is related to the functioning of cohesin in holding newly replicated DNAs together until mitosis and whatever other role it might have in chromosome condensation.

      Author response image 2.

      Ppm1g full locus chr5

      Moreover, given the unique specificity with which Nhomie and Homie are known to pair (and exhibit "homing" activity), it is conceivable that formation of the eve TAD by boundary pairing represents a phenomenon observed at exceptional loci rather than a universal rule of TAD formation. Indeed, characteristic Micro-C features of the eve TAD are only observed at a restricted number of loci in the fly genome…..

      (1.2) The available evidence does not support the claim that nhomie and homie are “exceptional.”  To begin with, nhomie and homie rely on precisely the same set of factors that have been implicated in the functioning of other boundaries in the fly genome.  For example, homie requires (among other factors) the generic boundary protein Su(Hw) for insulation and long-distance interactions (Fujioka et al. 2024).  (This is also true of nhomie: unpublished data.)  The Su(Hw) protein (like other fly polydactyl zinc finger proteins) can engage in distant interactions.  This was first shown by Sigrist and Pirrotta (Sigrist and Pirrotta 1997), who found that the su(Hw) element from the gypsy transposon can mediate long-distance regulatory interactions (PRE dependent silencing) between transgenes inserted at different sites on homologous chromosomes (trans interactions) and at sites on different chromosomes.

      The ability to mediate long-distance interactions is not unique to the su(Hw) element, or homie and nhomie.  Muller et al. (Muller et al. 1999) found that the Mcp boundary from the Drosophila BX-C is also able to engage in long-distance regulatory interactions—both PRE-dependent silencing of mini-white and enhancer activation of mini-white and yellow.  The functioning of the Mcp boundary depends upon two other generic insulator proteins, Pita and the fly CTCF homolog (Kyrchanova et al. 2017).  Like Su(Hw) both are polydactyl zinc finger proteins, and they resemble the mammalian CTCF protein in that their N-terminal domain mediates multimerization (Bonchuk et al. 2020; Zolotarev et al. 2016).  Figure 6 from Muller et el. 1999 shows PRE-dependent “pairing sensitive silencing” interactions between transgenes carrying a mini-white reporter, the Mcp and scs’ (Beaf dependent)(Hart et al. 1997) boundary elements, and a PRE closely linked to Mcp.  In this experiment flies homozygous for different transgene inserts were mated and the eye color was examined in their transheterozygous progeny.  As indicated in the figure, the strongest trans-silencing interactions were observed for inserts on the same chromosomal arm; however, transgenes inserted on the left arm of chromosome 3 can interact across the centromere with transgenes inserted on the right arm of chromosome 3. 

      Figure 5C (left) from Muller et el. 1999 shows a trans-silencing interaction between w#11.102 at 84D and w#11.16 approximately 5.8 Mb away, at 87D.  Figure 5C (right) shows a trans-silencing interaction across the centromere between w#14.29 on the left arm of chromosome 3 at 78F and w#11.102 on the right arm of chromosome 3 at 84D. The eye color phenotype of mini-white-containing transgenes is usually additive: homozygyous inserts have twice as dark eye color as the corresponding hemizygous inserts.  Likewise, in flies trans-_heterozygous for _mini-white transgenes inserted at different sites, the eye color is equivalent to the sum of the two transgenes.  This is not true when mini-white transgenes are silenced by PREs.  In the combination shown in panel A, the t_rans-_heterozygous fly has a lighter eye color than either of the parents.  In the combination in panel B, the _trans-_heterozygous fly is slightly lighter than either parent.

      As evident from the diagram in Figure 6 from Muller et el. 1999, all of the transgenes inserted on the 3rd chromosome that were tested were able to participate in long distance (>Mbs) regulatory interactions.  On the other hand, not all possible pairwise interactions are observed.  This would suggest that potential interactions depend upon the large scale (Mb) 3D folding of the 3rd chromosome.

      When the scs boundary (Zw5 dependent) (Gaszner et al. 1999) was added to the transgene to give sMws’, it further enhanced the ability of distant transgenes to find each other and pair.  All eight of the sMws’ inserts that were tested were able to interact with at least one other sMws’ insert on a different chromosome and silence mini-white.  Vazquez et al. () subsequently tagged the sMws’ transgene with LacO sequences (ps0Mws’) and visualized pairing interactions in imaginal discs.  Trans-heterozygous combinations on the same chromosome were found paired in 94-99% of the disc nuclei, while a trans-heterozygous combination on different chromosomes was found paired in 96% of the nuclei (Table 3 from Vazquez et al. 2006).  Vazquez et al. also examined a combination of four transgenes inserted on the same chromosome (two at the same insertion site, and two at different insertion sites).  In this case, all four transgenes were clustered together in 94% of the nuclei (Table 3 from Vazquez et al. 2006).  Their studies also suggest that the distant transgenes remain paired for at least several hours.  A similar experiment was done by Li et al. (Li et al. 2011), except that the transgene contained only a single boundary, Mcp or Fab-7.  While pairing was still observed in trans-heterozygotes, the frequency was reduced without scs and scs’.

      It is worth pointing out that there is no plausible mechanism in which cohesin could extrude a loop through hundreds of intervening TADs, across the centromere (ff#13.101_ßà_w#11.102: Figure 6 from Muller et el. 1999; w#14.29_ßà_w#11.02: Figure 6 from Muller et el. 1999 and 5) and come to a halt when it “encounters” Mcp containing transgenes on different homologs.  The same is true for Mcp-dependent pairing interactions in cis (Fig. 7 in Muller et al. (Muller et al. 1999)) or Mcp-dependent pairing interactions between transgenes inserted on different chromosomes (Fig. 8 in Muller et al. (Muller et al. 1999); Line 8 in Table 3 from Vazquez et al. 2006). 

      These are not the only boundaries that can engage in long-distance pairing.  Mohana et al. (Mohana et al. 2023) identified nearly 60 meta-loops, many of which appear to be formed by the pairing of TAD boundary elements.  Two examples (at 200 bp resolution from 12-16 hr embryos) are shown in Author response image 3.

      Author response image 3.

      Metaloops on the 2nd and 3rd chromosomes: circle-loops and multiple stem-loops

      One of these meta-loops (panel A) is generated by the pairing of two TAD boundaries on the 2nd chromosome.  The first boundary, blue, (indicated by blue arrow) is located at ~2,006, 500 bp between a small TAD containing the Nplp4 and CG15353 genes and a larger TAD containing 3 genes, CG33543, Obp22a and Npc2aNplp4 encodes a neuropeptide.  The functions of CG15354 and CG33543 are unknown.  Obp22a encodes an odorant binding protein, while Npc2a encodes the Niemann-Pick type C-2a protein which is involved sterol homeostasis.  The other boundary (purple: indicated by purple arrow) is located between two TADs 2.8 Mb away at 4,794,250 bp.  The upstream TAD contains the fipi gene (CG15630) which has neuronal functions in male courtship, while the downstream TAD contains CG3294, which is thought to be a spliceosome component, and schlaff (slf) which encodes a chitin binding protein.  As illustrated in the accompanying diagram, the blue boundary pairs with the purple boundary in a head-to-head orientation, generating a ~2.8 Mb loop with a circle-loop topology.  As a result of this pairing, the multi-gene (CG33543, Obp22a and Npc2a) TAD upstream of the blue boundary interacts with the CG15630 TAD upstream of the purple boundary.  Conversely the small Nplp4:CG15353 TAD downstream of the blue boundary interacts with the CG3294:slf TAD downstream of the purple boundary.  Even if one imagined that the cohesin bolo tie clip was somehow able to extrude 2.8 Mb of chromatin and then know to stop when it encountered the blue and purple boundaries, it would’ve generated a stemloop, not a circle-loop.

      The second meta-loop (panel B) is more complicated as it is generated by pairing interactions between four boundary elements.  The blue boundary (blue arrow) located ~4,801,800 bp (3L) separates a large TAD containing the RhoGEF64C gene from a small TAD containing CG7509, which encodes a predicted subunit of an extracellular carboxypeptidase.  As can be seen in the MicroC contact profile and the accompanying diagram, the blue boundary pairs with the purple boundary (purple arrow) which is located at ~7,013, 500 (3L) just upstream of the 2nd internal promoter (indicated by black arrowhead) of the Mp (Multiplexin) gene.  This pairing interaction is head-to-tail and generates a large stem-loop that spans ~2.2 Mb.  The stem-loop brings sequences upstream of the blue boundary and downstream of the purple boundary into contact (the strings below a bolo tie clip), just as was observed in the boundary bypass experiments of Muravyova et al. (Muravyova et al. 2001) and Kyrchanova et al. (Kyrchanova et al. 2008).  The physical interactions result in a box of contacts (right top) between sequences in the large RhoGEF64C TAD and sequences in a large TAD that contains an internal Mp promoter.  The second pairing interaction is between the brown boundary (brown arrow) and the green boundary (green arrow).  The brown boundary is located at ~4 805,600 bp (3L) and separates the TAD containing CG7590 from a large TAD containing CG1808 (predicted to encode an oxidoreductase) and the Dhc64C (Dynein heavy chain 64C) gene.  The green boundary is located at ~6,995,500 bp (3L), and it separates a TAD containing CG32388 and the biniou (bin) transcription factor from a TAD that contains the most distal promoter of the Mp (Multiplexin) gene (blue arrowhead).  As indicated in the diagram, the brown and green boundaries pair with each other head-to-tail, and this generates a small internal loop (and the final configuration would resemble a bolo tie with two tie clips).  This small internal loop brings the CG7590 TAD into contact with the TAD that extends from the distal Mp promoter to the 2nd internal Mp promoter.  The resulting contact profile is a rectangular box with diagonal endpoints corresponding to the paired blue:purple and brown:green boundaries.  The pairing of the brown:green boundaries also brings the TADs immediately downstream of the brown boundary and upstream of the green boundary into contact with each other, and this gives a rectangular box of interactions between the Dhc64C TAD, and sequences in the bin/CG3238 TAD.  This box is located on the lower left side of the contact map.

      Since the bin and Mp meta-loops in Author response image 3B are stem-loops, they could have been generated by “sequential” cohesin loop extrusion events.  Besides the fact that cohesin extrusion of 2 Mb of chromatin and breaking through multiple intervening TAD boundaries challenges the imagination, there is no mechanism in the cohesion loop extrusion/CTCF roadblock model to explain why cohesion complex 1 would come to a halt at the purple boundary on one side and the blue boundary on the other, while cohesin complex 2 would instead stop when it hits the brown and green boundaries.  This highlights another problem with the cohesin loop extrusion/CTCF roadblock model, namely that the roadblocks are functionally autonomous: they have an intrinsic ability to block cohesin that is entirely independent of the intrinsic ability of other roadblocks in the neighborhood.  As a result, there is no mechanism for generating specificity in loop formation.  By contrast, boundary pairing interactions are by definition non-autonomous and depend on the ability of individual boundaries to pair with other boundaries: specificity is built into the model. The mechanism for pairing, and accordingly the basis for partner preferences/specificity, are reasonably well understood.  Probably the most common mechanism in flies is based on shared binding sites for architectural proteins that can form dimers or multimers (Bonchuk et al. 2021; Fedotova et al. 2017).  Flies have a large family of polydactyl zinc finger DNA binding proteins, and as noted above, many of these form dimers or multimers and also function as TAD boundary proteins.  This pairing principle was first discovered by Kyrchanova et al. (Kyrchanova et al. 2008).  This paper also showed that orientation-dependent pairing interactions is a common feature of endogenous fly boundaries.  Another mechanism for pairing is specific protein:protein interactions between different DNA binding factors (Blanton et al. 2003).  Yet a third mechanism would be proteins that bridge different DNA binding proteins together.  The boundaries that use these different mechanisms (BX-C boundaries, scs, scs’) depend upon the same sorts of proteins that are used by homie and nhomie.  Likewise, these same set of factors reappear in one combination or another in most other TAD boundaries.  As for the orientation of pairing interactions, this is most likely determined by the order of binding sites for chromosome architectural proteins in the partner boundaries.

      …and many TADs lack focal 3D interactions between their boundaries.

      (1.3) The idea that flies differ from mammals in that they “lack” focal 3D interactions is simply mistaken.  One of the problems with drawing this distinction is that most all of the “focal 3D interactions” seen mammalian Hi-C experiments are a consequence of binning large DNA segments in low resolution restriction enzyme-dependent experiments.  This is even true in the two “high” resolution MicroC experiments that have been published (Hsieh et al. 2020; Krietenstein et al. 2020).  As illustrated above in Author response image 1, most of the “focal 3D interactions” (the dots at the apex of TAD triangles) seen with large bin sizes (1 kb and greater) disappear when the bin size is 200 bp and TADs rather than TAD neighborhoods are being visualized.

      As described in point #1.1, in the MicroC protocol, fixed chromatin is first digested to mononucloesomes by extensive MNase digestion, processed/biotinylated, and ligated to give dinucleosome-length fragments, which are then sequenced.  Regions of chromatin that are nucleosome free (promoters, enhancers, silencers, boundary elements) will typically be reduced to oligonucleotides in this procedure and will not be recovered when dinucleosome-length fragments are sequenced.  The loss of sequences from typical paired boundary elements is illustrated by the lar meta-loop shown in Author response image 4 (at 200 bp resolution).  Panels A and B show the contact profiles generated when the blue boundary (which separates two TADs that span  the Lar (Leukocyteantigen-related-like) transcription unit interacts with the purple boundary (which separates two TADs in a gene poor region ~620 kb away).  The blue and purple boundaries pair with each other head-to-head, and this pairing orientation generates yet another circle-loop.  In the circle-loop topology, sequences in the TADs upstream of both boundaries come into contact with each other, and this gives the small dark rectangular box to the upper left of the paired boundaries (Author response image 4A).  (Note that this small box corresponds to the two small TADs upstream of the blue and purple boundaries, respectively. See panel B.)  Sequences in the TADs downstream of the two boundaries also come into contact with each other, and this gives the large box to the lower right of the paired boundaries.  While this meta-loop is clearly generated by pairing interactions between the blue and purple boundaries, the interacting sequences are degraded in the MicroC protocol, and sequences corresponding to the blue and purple boundaries aren’t recovered.  This can be seen in panel B (red arrow and red arrowheads).  When a different Hi-C procedure is used (dHS-C) that captures nucleosome-free regions of chromatin that are physically linked to each other (Author response image 4C & D), the sequences in the interacting blue and purple boundaries are recovered and generate a prominent “dot” at their physical intersection (blue arrow in panel D).

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      While sequences corresponding to the blue and purple boundaries are lost in the MicroC procedure, there is at least one class of elements that engage in physical pairing interactions whose sequences are (comparatively) resistant to MNase digestion.  This class of elements includes many PREs ((Kyrchanova et al. 2018); unpublished data), the boundary bypass elements in the Abd-B region of BX-C (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018), and “tethering” elements (Batut et al. 2022; Li et al. 2023).  In all of the cases tested, these elements are bound in nuclear extracts by a large (>1000 kD) GAGA factor-containing multiprotein complex called LBC.  LBC also binds to the hsp70 and eve promoters (unpublished data).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that the LBC protects a ~120-180 bp DNA segment from MNase digestion.  It is likely that this is the reason why LBC-bound sequences can be recovered in MicroC experiments as dots when they are physically linked to each other.  One such example (based on the ChIP signatures of the paired elements) is indicated by the green arrow in panel B and D of Author response image 4.  Note that there are no dots corresponding to these two LBC elements within either of the TADs immediately downstream of the blue and purple boundaries.  Instead the sequences corresponding to the two LBC elements are only recovered when the two elements pair with each other over a distance of ~620 kb.  The fact that these two elements pair with each other is consistent with other findings which indicate that, like classical boundaries, LBC elements exhibit partner preferences.  In fact, LBC elements can sometimes function as TAD boundaries.  For example, the Fab-7 boundary has two LBC elements, and full Fab-7 boundary function can be reconstituted with just these two elements (Kyrchanova et al. 2018).

      Reviewer #2 (Public Review):

      "Chromatin Structure II: Stem-loops and circle-loops" by Ke*, Fujioka*, Schedl, and Jaynes reports a set of experiments and subsequent analyses focusing on the role of Drosophila boundary elements in shaping 3D genome structure and regulating gene expression. The authors primarily focus on the region of the fly genome containing the even skipped (eve) gene; eve is expressed in a canonical spatial pattern in fly embryos and its locus is flanked by the well-characterized neighbor of homie (nhomie) and homie boundary elements. The main focus of investigation is the orientation dependence of these boundary elements, which had been observed previously using reporter assays. In this study, the authors use Crispr/Cas9 editing followed by recombination-mediated cassette exchange to create a series of recombinant fly lines in which the nhomie boundary element is either replaced with exongenous sequence from phage 𝝀, an inversion of nhomie, or a copy of homie that has the same orientation as the endogenous homie sequence. The nhomie sequence is also regenerated in its native orientation to control for effects introduced by the transgenesis process.

      The authors then perform high-resolution Micro-C to analyze 3D structure and couple this with fluorescent and colorimetric RNA in situ hybridization experiments to measure the expression of eve and nearby genes during different stages of fly development. The major findings of these experiments are that total loss of boundary sequence (replacement with 𝝀 DNA) results in major 3D structure changes and the most prominent observed gene changes, while inversion of the nhomie boundary or replacement with homie resulted in more modest effects in terms of 3D structure and gene expression changes and a distinct pattern of gene expression change from the 𝝀 DNA replacement. As the samples in which the nhomie boundary is inverted or replaced with homie have similar Micro-C profiles at the eve locus and show similar patterns of a spurious gene activation relative to the control, the observed effects appear to be driven by the relative orientation of the nhomie and homie boundary elements to one another.

      Collectively, the findings reported in the manuscript are of broad interest to the 3D genome field. Although extensive work has gone into characterizing the patterns of 3D genome organization in a whole host of species, the underlying mechanisms that structure genomes and their functional consequences are still poorly understood. The perhaps best understood system, mechanistically, is the coordinated action of CTCF with the cohesin complex, which in vertebrates appears to shape 3D contact maps through a loop extrusion-pausing mechanism that relies on orientation-dependent sequence elements found at the boundaries of interacting chromatin loops.

      (2.1) The notion that mammalian genome is shaped in 3D by the coordinate action of cohesin and CTCF has achieved the status of dogma in the field of chromosome structure in vertebrates.  However, as we have pointed out in #1.1, the evidence supporting this dogma is far from convincing.  To begin with, it is based on low resolution Hi-C experiments that rely on large bin sizes to visualize so-called “TADs.”  In fact, the notion that cohesin/CTCF are responsible on their own for shaping the mammalian 3D genome appears to be a result of mistaking a series of forests for the actual trees that populate each of the forests.

      As illustrated in Author response image 1 above, the “TADs” that are visualized in these low resolution data sets are not TADs at all, but rather TAD neighborhoods consisting of several dozen or more individual TADs.  Moreover, the “interesting” features that are evident at low resolution (>1 kb)—the dots and stripes—largely disappear at resolutions appropriate for visualizing individual TADs (~200 bp).

      In Goel et al. 2023, we presented data from one of the key experiments in Goel et al. (Goel et al. 2023).  In this experiment,  the authors used RCMC to generate high resolution (~250 bp) MicroC contact maps before and after Rad21 depletion.  Contrary to dogma, Rad21 depletion has absolutely no effect on TADs in a ~250 kb DNA segment—and these TADs look very much like the TADs we observe in the Drosophila genome, in particular in the Abd-B region of BX-C that is thought to be assembled into a series of circle-loops (see Fig. 2B).

      While Goel et al. (Goel et al. 2023) observed no effect of Rad21 depletion on TADs, they found that loss of Rad21 disturbs long-distance (but not short-distance) contacts in large TAD neighborhoods when their RCMC data set is visualized using bin sizes of 5 kb and I kb.  This is shown in Author response image 2.  The significance of this finding is, however, uncertain.  It could mean that the 3D organization of large TAD neighborhoods have a special requirement for cohesin activity.  On the other hand, since cohesin functions to hold sister chromosomes together after replication until they separate during mitosis (and might also participate in mitotic condensation), it is also possible that the loss of long-range contacts in large TAD neighborhoods when Rad21 is depleted is simply a reflection of this particular activity.  Further studies will be required to address these possibilities.

      As for CTCF: a careful inspection of the ChIP data in Goel et al. 2023 indicates that CTCF is not found at each and every TAD boundary.  In fact, the notion that CTCF is the be-all and end-all of TAD boundaries in mammals is truly hard to fathom.  For one, the demands for specificity in TAD formation (and in regulatory interactions) are likely much greater than those in flies, and specificity can’t be generated by a single DNA binding protein.  For another, several dozen chromosomal architectural proteins have already been identified in flies.  This means that (unlike what is thought to be true in mammals) it is possible to use a combinatorial mechanism to generate specificity in, for example, the long distance interactions in RFig 6 and 7.  As noted in #2.1 above, many of the known chromosomal architectural proteins in flies are polydactyl zinc finger proteins (just like CTCF).  There are some 200 different polydactyl zinc finger proteins in flies, and the function of only a hand full of these is known at present.  However, it seems likely that a reasonable fraction of this class of DNA binding proteins will ultimately turn out to have an architectural function of some type (Bonchuk et al. 2021; Fedotova et al. 2017).  The number of different polydactyl zinc finger protein genes in mammals is nearly 3 times that of flies.  It is really possible that of these, only CTCF is involved in shaping the 3D structure of the mammalian genome?

      Despite having a CTCF paralog and cohesin, the Drosophila genome does not appear to be structure by loop extrusion-pausing. The identification of orientation-dependent elements with pronounced structural effects on genome folding thus may shed light on alternative mechanisms used to regulated genome structure, which in turn may yield insights into the significance of particular folding patterns.

      (2.2) Here we would like to draw the reviewer’s and reader’s attention to Author response image 3, which shows that orientation-dependent pairing interactions have a significant impact on physical interactions between different sequences.  We would also refer the reader to two other publications.  One of these is Kyrchanova et al. (Kyrchanova et al. 2008), which was the first to demonstrate that orientation of pairing interactions matters.  The second is Fujioka et al. (Fujioka et al. 2016), which describes experiments indicating that nhomie and homie pair with each other head-to-tail and with themselves head-to-head.

      On the whole, this study is comprehensive and represents a useful contribution to the 3D genome field. The transgenic lines and Micro-C datasets generated in the course of the work will be valuable resources for the research community. Moreover, the manuscript, while dense in places, is generally clearly written and comprehensive in its description of the work. However, I have a number of comments and critiques of the manuscript, mainly centering on the framing of the experiments and presentation of the Micro-C results and on manner in which the data are analyzed and reported. They are as follows:

      Major Points:

      (1) The authors motivate much of the introduction and results with hypothetical "stem loop" and "circle loop" models of chromosome confirmation, which they argue are reflected in the Micro-C data and help to explain the observed ISH patterns. While such structures may possibly form, the support for these specific models vs. the many alternatives is not in any way justified. For instance, no consideration is given to important biophysical properties such as persistence length, packing/scaling, and conformational entropy. As the biophysical properties of chromatin are a very trafficked topic both in terms of experimentation and computational modeling and generally considered in the analysis of chromosome conformation data, the study would be strengthened by acknowledgement of this body of work and more direct integration of its findings.

      (2.3) The reviewer is not correct in claiming that “stem-loops” and “circle-loops” are “hypothetical.”  There is ample evidence that both types of loops are present in eukaryotic genomes, and that loop conformation has significant readouts in terms of not only the physical properties of TADs but also their functional properties.  Here we would draw the reviewer’s attention to Author response image 3 and Author response image 4 for examples of loops formed by the orientation-dependent pairing of yet other TAD boundary elements.  As evident from the MicroC data in these figures, circle-loops and stem-loops have readily distinguishable contact patterns.  The experiments in Fujioka et al. (Fujioka et al. 2016) demonstrate that homie and nhomie pair with each other head-to-tail, while they pair with themselves head-to-head.  The accompany paper (Bing et al. 2024) also provides evidence that loop topology is reflected both in the pattern of activation of reporters and in the MicroC contact profiles.  We would also mention again Kyrchanova et al. (Kyrchanova et al. 2008), who were the first to report orientation-dependent pairing of endogenous fly boundaries.

      At this juncture it would premature to try to incorporate computational modeling of chromosome conformation in our studies.  The reason is that the experimental foundations that would be essential for building accurate models are lacking.  As should be evident from RFigs. 1-3 above, studies on mammalian chromosomes are simply not of high enough resolution to draw firm conclusions about chromosome conformation: in most studies only the forests are visible.  While the situation is better in flies, there are still too many unknown.  As just one example, it would be important to know the orientation of the boundary pairing interactions that generate each TAD.  While it is possible to infer loop topology from how TADs interact with their neighbors (a plume versus clouds), a conclusive identification of stem- and circle-loops will require a method to unambiguously determine whether a TAD boundary pairs with its neighbor head-to-head or headto-tail.

      (2) Similar to Point 1, while there is a fair amount of discussion of how the observed results are or are not consistent with loop extrusion, there is no discussion of the biophysical forces that are thought to underly compartmentalization such as block-polymer co-segregation and their potential influence. I found this absence surprising, as it is generally accepted that A/B compartmentalization essentially can explain the contact maps observed in Drosophila and other non-vertebrate eukaryotes (Rowley, ..., Corces 2017; PMID 28826674). The manuscript would be strengthened by consideration of this phenomenon.

      (2.4) Compartments in mammals have typically been identified and characterized using lowresolution data sets, and these studies have relied on visualizing compartments using quite large bin sizes (>>1 kb).  Our experiments have nothing to do with the large-scale compartments seen in these Hi-C experiments.  Instead, we are studying the properties of individual TADs: how TADs are formed, the relationship between TAD topology and boundary:boundary pairing, and the impact of TAD topology on interactions between TADs in the immediate neighborhood.  There is no evidence to date that these large compartments or “block polymer co-segregation” have a) any impact on the properties of individual boundary elements, b) have a role in determining which boundary elements actually come together to form a given TAD, c) impact the orientation of the interactions between boundaries that generate the TAD or d) determine how TADs tend to interact with their immediate neighbors.  

      In more recent publications (c.f., Harris et al. 2023) compartments have shrunk in size and instead of being units of several hundred kb, the median length of the “compartmental” unit in mammalian cells is about12 kb. This is not too much different from the size of fly TADs.  However, the available evidence does not support the idea that block polymer co-segregation/co-repulsion drive the TAD:TAD interactions seen in MicroC experiments.  For example, according to this “micro-compartment” model, the specific patterns of interaction between TADs in the CG3294 meta-loop in Author response image 3 would be driven by block polymer co-segregation and co-repulsion. In this model, the TAD upstream of the blue boundary (which contains CG33543, the odorant binding protein gene Obp22a and the Npc2a gene which encodes a protein involved in sterol homeostasis) would share the same chromatin state/biophysical properties as the TAD upstream of the purple boundary, which has the fipi gene. While it is true that CG33543, Obp22a and also the fipi gene are not expressed in embryos, Npc2a is expressed at high levels during embryogenesis, yet it is part of the TAD that interacts with the fipi TAD.  The TAD downstream of the blue boundary contains CG15353 and Nplp4 and it interacts with the TAD downstream of the purple boundary which contains CG3294 and slfCG15353 and Nplp4 are not expressed in the embryo and as such should share a compartment with a TAD that is also silent. However, slf is expressed at a high level in 1216 hr embryos, while CG3294 is expressed at a low level.  In neither case would one conclude that the TADs upstream and downstream of the blue and purple boundaries, respectively, interact because of shared chromatin/biophysical states that drive block polymer co-segregation corepulsion. 

      One might also consider several gedanken experiments involving the long-range interactions that generate the CG3294 meta-loop in Author response image 3.    According to the micro-compartment model the patchwork pattern of crosslinking evident in the CG3294 meta-loop arises because the interacting  TADs share the same biochemical/biophysical properties, and this drives block polymer cosegregation and co-repulsion.  If this model is correct, then this patchwork pattern of TAD:TAD interactions would remain unchanged if we were to delete the blue or the purple boundary.  However, given what we know about how boundaries can find and pair with distant boundaries (c.f., Figure 6 from Muller et el. 1999 and the discussion in #1.2), the result of these gedanken experiments seem clear: the patchwork pattern shown in Author response image 3A will disappear.  What would happen if we inverted the blue or the purple boundary? Would the TAD containing CG33543, Obp22a and Npc2a still interact with fipi as would be expected from the compartment model?  Or would the pattern of interactions flip so that the CG33543, Obp22a and Npc2a TAD interacts with the TAD containing CG3294 and slf?  Again we can anticipate the results based on previous studies: the interacting TADs will switch when the CG3294 meta-loop is converted into a stem-loop.  If this happened, the only explanation possible in the compartment model is that the chromatin states change when the boundary is inverted so that TAD upstream of blue boundary now shares the same chromatin state as the TAD downstream of the purple boundary, while the TAD downstream of the blue boundary shares same state as the TAD upstream of the purple boundary.  However, there is no evidence that boundary orientation per se can induce a complete switch in “chromatin states” as would be required in the compartment model. 

      While we have not done these experimental manipulations with the CG3294 meta-loop, an equivalent experiment was done in Bing et al. (Bing et al. 2024).  However, instead of deleting a boundary element, we inserted a homie boundary element together with two reporters (gfp and LacZ) 142 kb away from the eve TAD.  The result of this gedanken “reverse boundary deletion” experiment is shown in Author response image 5.  Panel A shows the MicroC contact profile in the region spanning the transgene insertion site and the eve TAD in wild type (read “deletion”) NC14 embryos.  Panel B shows the MicroC contact profile from 12-16 hr embryos carrying the homie dual reporter transgene inserted at -142 kb.  Prior to the “deletion”, the homie element in the transgene pairs with nhomie and homie in the eve TAD and this generates a “mini-metaloop.”  In this particular insert, the homie boundary in the transgene (red arrow) is “pointing” in the opposite orientation from the homie boundary in the eve TAD (red arrow).  In this orientation, the pairing of the transgene homie with eve nhomie/homie brings the LacZ reporter into contact with sequences in the eve TAD.  Since a mini-metaloop is formed by homie_à _nhomie/homie pairing, sequences in TADs upstream and downstream of the transgene insert interact with sequences in TADs close to the eve TAD (Author response image 5B).  Taken together these interactions correspond to the interaction patchwork that is typically seen in “compartments” (see boxed region and inset).  If this patchwork is driven as per the model, by block polymer co-segregation and co-repulsion, then it should still be present when the transgene is deleted.  However, panel A shows that the interactions linking the transgene and the sequences in TADs next to the transgene to eve and TADs next to eve disappear when the homie boundary (plus transgene) is “deleted” in wild type flies.

      Author response image 5.

      Boundary deletion and compartments

      A second experiment would be to invert the homie boundary so that instead of pointing away from eve it points towards eve.  Again, if the compartmental patchwork is driven by block polymer co-segregation and co-repulsion, inverting the homie boundary in the transgene should have no effect on the compartmental contact profile.  Inspection of Fig. 7 in Bing et al. (Bing et al. 2024) will show that this prediction doesn’t hold either.  When homie is inverted, sequences in the eve TAD interact with the gfp reporter not the LacZ reporter.  In addition, there are corresponding changes in how sequences in TADs to either side of eve interact with sequences to either side of the transgene insert.  

      Yet another “test” of compartments generated by block polymer co-segregation/co-repulsion is provided by the plume above the eve volcano triangle.  According to the compartment model, sequences in TADs flanking the eve locus form the plume above the eve volcano triangle because their chromatin shares properties that drive block polymer co-segregation.  These same properties result in repulsive interactions with chromatin in the eve TAD, and this would explain why the eve TAD doesn’t crosslink with its neighbors.  If the distinctive chromatin properties of eve and the neighboring TADs drive block polymer co-segregation and co-repulsion, then inverting the nhomie boundary or introducing homie in the forward orientation should have absolutely no effect on the physical interactions between chromatin in the eve TAD and chromatin in the neighboring TADs.  However, Figures 4 and 6 in this paper indicate that boundary pairing orientation, not block polymer co-segregation/co-repulsion, is responsible for forming the plume above the eve TAD. Other findings also appear to be inconsistent with the compartment model. (A) The plume topping the eve volcano triangle is present in NC14 embryos when eve is broadly expressed (and potentially active throughout the embryo).  It is also present in 12-16 hr embryos when eve is only expressed in a very small subset of cells and is subject to PcG silencing everywhere else in the embryo.  B) According to the compartment model the precise patchwork pattern of physical interactions should depend upon the transcriptional program/chromatin state that is characteristic of a particular developmental stage or cell type.  As cell fate decisions are just being made during NC14 one might expect that most nuclei will share similar chromatin states throughout much of the genome.  This would not be true for 12-16 hr embryos.  At this stage the compartmental patchwork would be generated by a complex mixture of interactions in cells that have quite different transcriptional programs and chromatin states.  In this case, the patchwork pattern would be expected to become fuzzy as a given chromosomal segment would be in compartment A in one group of cells and in compartment B in another.   Unlike 12-16 hr embryos,  larval wing discs would be much more homogeneous and likely give a distinct and relatively well resolved compartmental pattern. We’ve examined the compartment patchwork of the same chromosomal segments in NC14 embryos, 12-16 hr embryos and larval wing disc cells.  While there are some differences (e.g., changes in some of the BX-C TADs in the wing disc sample) the compartmental patchwork patterns are surprisingly similar in all three cases. Nor is there any “fuzziness” in the compartmental patterns evident in 12-16 hr embryos, despite the fact that there are many different cell types at this stage of development.  C) TAD interactions with their neighbors and compartmental patchworks are substantially suppressed in salivary gland polytene chromosomes.  This would suggest that features of chromosome structure might be the driving force behind many of the “compartmental” interactions as opposed to distinct biochemical/biophysical of properties of small chromosomal segments that drive polymer co- segregation/co-repulsion.  

      (3) The contact maps presented in the study represent many cells and distinct cell types. It is clear from single-cell Hi-C and multiplexed FISH experiments that chromosome conformation is highly variable even within populations of the same cell, let alone between cell types, with structures such as TADs being entirely absent at the single cell level and only appearing upon pseudobulking. It is difficult to square these observations with the models of relatively static structures depicted here. The authors should provide commentary on this point.

      (2.5) As should be evident from Author response image 1, single-cell Hi-C experiments would not provide useful information about the physical organization of individual TADs, TAD boundaries or how individual TADs interact with their immediate neighbors.  In addition, since they capture only a very small fraction of the possible contacts within and between TADs, we suspect that these single-cell studies aren’t likely to be useful for making solid conclusions about TAD neighborhoods like those shown in Author response image 1 panels A, B, C and D, or Author response image 2.  While it might be possible to discern relatively stable contacts between pairs of insulators in single cells with the right experimental protocol, the stabilities/dynamics of these interactions may be better judged by the length of time that physical interactions are seen to persist in live imaging studies such as Chen et al. (2018), Vazquez et al. (2006) and Li et al. (2011).

      The in situ FISH data we’ve seen also seems problematic in that probe hybridization results in a significant decondensation of chromatin.  For two probe sets complementary to adjacent ~1.2 kb DNA sequences, the measured center-to-center distance that we’ve seen was ~110 nM.  This is about 1/3rd the length that is expected for a 1.2 kb naked DNA fragment, and about 1.7 times larger than that expected for a beads-on-a-string nucleosome array (~60 nM).  However, chromatin is thought to be compacted into a 30 nM fiber, which is estimated to reduce the length of DNA by at least another ~6 fold.  If this estimate is correct, FISH hybridization would appear to result in a ~10 fold decompaction of chromatin.  A decompaction of this magnitude would necessarily be followed by a significant distortion in the actual conformation of chromatin loops.

      (4) The analysis of the Micro-C data appears to be largely qualitative. Key information about the number of reads sequenced, reaps mapped, and data quality are not presented. No quantitative framework for identifying features such as the "plumes" is described. The study and its findings would be strengthened by a more rigorous analysis of these rich datasets, including the use of systematic thresholds for calling patterns of organization in the data.

      Additional information on the number of reads and data quality have been included in the methods section. 

      (5) Related to Point 4, the lack of quantitative details about the Micro-C data make it difficult to evaluate if the changes observed are due to biological or technical factors. It is essential that the authors provide quantitative means of controlling for factors like sampling depth, normalization, and data quality between the samples.

      In our view the changes in the MicroC contact patterns for the eve locus and its neighbors when the nhomie boundary is manipulated are not only clear cut and unambiguous but are also readily evident in the Figs that are presented in the manuscript.  If the reviewer believes that there aren’t significant differences between the MicroC contact patterns for the four different nhomie replacements, it seems certain that they would also remain unconvinced by a quantitative analysis.

      The reviewer also suggests that biological and/or technical differences between the four samples could account for the observed changes in the MicroC patterns for the eve TAD and its neighbors.  If this were the case, then similar changes in MicroC patterns should be observed elsewhere in the genome.  Since much of the genome is analyzed in these MicroC experiments there is an abundance of internal controls for each experimental manipulation of the nhomie boundary.  For two of the nhomie replacements, nhomie reverse and homie forward, the plume above the eve volcano triangle is replaced by clouds surrounding the eve volcano triangle.  If these changes in the eve MicroC contact patterns are due to significant technical (or biological) factors, we should observe precisely the same sorts of changes in TADs elsewhere in the genome that are volcano triangles with plumes.   Author response image 6 shows the MicroC contact pattern for several genes in the Antennapedia complex.  The deformed gene is included in a TAD which, like eve, is a volcano triangle topped by a plume.  A comparison of the deformed MicroC contact patterns for nhomie forward (panel B) with the MicroC patterns for nhomie reverse (panel C) and homie forward (panel D) indicates that while there are clearly technical differences between the samples, these differences do not result in the conversion of the deformed plume into clouds as is observed for the eve TAD.  The MicroC patterns elsewhere in Antennapedia complex are also very similar in all four samples.  Likewise, comparisons of regions elsewhere in the fly genome indicate that the basic contact patterns are similar in all four samples.   So while there are technical differences which are reflected in the relative pixel density in the TAD triangles and the LDC domains, these differences do not result in converting plumes into clouds nor do the alter the basic patterns of TAD triangles and LDC domains.  As for biological differences— the embryos in each sample are at roughly the same developmental stage and were collected and processed using the same procedures. Thus, the biological factors that could reasonably be expected to impact the organization of specific TADs (e.g., cell type specific differences) are not going to impact the patterns we see in our experiments. 

      Author response image 6.

      (6) The ISH effects reported are modest, especially in the case of the HCR. The details provided for how the imaging data were acquired and analyzed are minimal, which makes evaluating them challenging. It would strengthen the study to provide much more detail about the acquisition and analysis and to include depiction of intermediates in the analysis process, e.g. the showing segmentation of stripes.

      The imaging analysis is presented in Fig. 5 is just standard confocal microscopy.  Individual embryos were visualized and scored.  An embryo in which stripes could be readily detected was scored as ‘positive’ while an embryo in which stripes couldn’t be detected was scored as ‘negative.’   

      Recommendations for the authors:

      Editor comments:

      It was noted that the Jaynes lab previously published extensive genetic evidence to support the stem loop and circle loop models of Homie-Nhomie interactions (Fujioka 2016 Plos Genetics) that were more convincing than the Micro-C data presented here in proof of their prior model. Maybe the authors could more clearly summarize their prior genetic results to further try to convince the reader about the validity of their model.

      Reviewer #1 (Recommendations For The Authors):

      Below, I list specific comments to further improve the manuscript for publication. Most importantly, I recommend the authors tone down their proposal that boundary pairing is a universal TAD forming mechanism.

      (1) The title is cryptic.

      (2) The second sentence in the abstract is an overstatement: "In flies, TADs are formed by physical interactions between neighboring boundaries". Hi-C and Micro-C studies have not provided evidence that most TADs in Drosophila show focal interactions between their bracketing boundaries. The authors rely too strongly on prior studies that used artificial reporter transgenes to show that multimerized insulator protein binding sites or some endogenous fly boundaries can mediate boundary bypass, as evidence that endogenous boundaries pair.

      Please see responses #1.1 and #1.3 and figures Author response image 1 and Author response image 3.  Note that using dHS-C, most TADs that we’ve looked at so far are topped by a “dot” at their apex.

      (3) Line 64: the references do not cite the stated "studies dating back to the '90's'".

      The papers cited for that sentence are reviews which discussed the earlier findings.  The relevant publications are cited at the appropriate places in the same paragraph.  

      (4) Line 93: "On the other hand, while boundaries have partner preferences, they are also promiscuous in their ability to establish functional interactions with other boundaries." It was unclear what is meant here.

      Boundaries that a) share binding sites for proteins that multimerized, b) have binding sites for proteins that interact with each other, or c) have binding sites for proteins that can be bridged by a third protein can potentially pair with each other.  However, while these mechanisms enable promiscuous pairing interactions, they will also generate partner preferences (through a greater number of a, b and/or c).

      (5) It could be interesting to discuss the fact that it remains unclear whether Nhomie and Homie pair in cis or in trans, given that homologous chromosomes are paired in Drosophila.

      The studies in Fujioka et al. (Fujioka et al. 2016) show that nhomie and homie can pair both in cis and in trans.  Given the results described in #1.2, we imagine that they are paired in both cis and trans in our experiments.

      (6) Line 321: Could the authors further explain why they think that "the nhomie reverse circle-loop also differs from the nhomie deletion (λ DNA) in that there is not such an obvious preference for which eve enhancers activate expression"?

      The likely explanation is that the topology/folding of the altered TADs impacts the probability of interactions between the various eve enhancers and the promoters of the flanking genes.  

      (7) The manuscript would benefit from shortening the long Discussion by avoiding repeating points described previously in the Results.

      (8) Line 495: "If, as seems likely, a significant fraction of the TADs genome-wide are circle loops, this would effectively exclude cohesin-based loop extrusion as a general mechanism for TAD formation in flies". The evidence provided in this manuscript appears insufficient to discard ample evidence from multiple laboratories that TADs form by compartmentalization or loop extrusion. Multiple laboratories have, for example, demonstrated that cohesin depletion disrupts a large fraction of mammalian TADs. 

      Points made here and in #9 have been responded to in #1.1, #2.1 and #2.4 above.  We would suggest that the evidence for loop extrusion falls short of compelling (as it is based on the analysis of TAD neighborhoods, not TADs—that is forests, not trees) and given the results reported in Goel et al. (in particular Fig. 4 and Sup Fig. 8) is clearly suspect. This is not to mention the fact that cohesin loop-extrusion can’t generate circle-loops TADs, yet circle-loops clearly exist.  Likewise, as discussed in #2.4, it is not clear to us that the shared chromatin states, polymer co-segregation and co-repulsion account for the compartmental patchwork patterns of TAD;TAD interactions. The results from the  experimental manipulations in this paper and the accompanying paper, together with studies by others (e.g., Kyrchanova et al. (Kyrchanova et al. 2008), Mohana et al. (Mohana et al. 2023) would also seem to be at odds with the model for compartments as currently formulated.  

      The unique properties of Nhomie and Homie, namely the remarkable specificity with which they physically pair over large distances (Fujioka et al. 2016) may rather suggest that boundary pairing is a phenomenon restricted to special loci. Moreover, it has not yet been demonstrated that Nhomie or Homie are also able to pair with the TAD boundaries on their left or right, respectively.

      Points made here were discussed in detail in #1.2.  As described in detail in #1.2, It is not the case that nhomie and homie are in “unique” or “special.”  Other fly boundaries can do the same things.  As for whether nhomie and homie pair with their neighbors:  We haven’t done transgene experiments (e.g., testing by transvection or boundary bypass).  Likewise, in MicroC experiments there are no obvious dots at the apex of the neighboring TADs that would correspond to nhomie pairing with the neighboring boundary to the left and homie pairing with the neighboring boundary to the right. However, this is to be expected. As we discussed in in #1.3 above, only MNase resistant elements will generate dots in standard MicroC experiments.  On the other hand, when boundary:boundary interactions are analyzed by dHS-C (c.f., Author response image 4), there are dots at the apex of both neighboring TADs.  This would be direct evidence that nhomie pairs with the neighboring boundary to the left and homie pairs with the neighboring boundary to the right.

      (9) The comment in point 8 also applies to the concluding 2 sentences (lines 519-524) of the Discussion.

      See response to 8 above. Otherwise, the concluding sentences are completely accurate. Validation of the cohesin loop extrusion/CTCF roadblock model will required demonstrating a) that all TADs are either stem-loops or unanchored loops and b) that TAD endpoints are always marked by CTCF. 

      The likely presence of circle-loops and evidence that TAD boundaries that don’t have CTCF (c.f.,Goel et al. 2023) already suggests that this model can’t (either fully or not all) account for TAD formation in mammals. 

      (10) Figs. 3 and 6: It would be helpful to add the WT screenshot in the same figure, for direct comparison.

      It is easy enough to scroll between Figs-especially since nhomie forward looks just like WT.

      (11) Fig. 6: It would be helpful to show a cartoon view of a circle loop to the right of the Micro-C screenshot, as was done in Fig. 3.

      Good idea.   Added to the Fig.

      (12) Fig. 5: It would be helpful to standardize the labelling of the different genotypes throughout the figures and panels ("inverted" versus "reverse" versus an arrow indicating the direction).

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) The Micro-C data does not appear to be deposited in an appropriate repository. It would be beneficial to the community to make these data available in this way.

      This has been done.

      (2) Readers not familiar with Drosophila development would benefit from a gentle introduction to the stages analyzed and some brief discussion on how the phenomenon of somatic homolog pairing might influence the study, if at all.

      We included a rough description the stages that were analyzed for both the in situs and MicroC. We thought that an actual description of what is going on at each of the stages wasn’t necessary as the process of development is not a focus of this manuscript.  In other studies, we’ve found that there are only minor differences in MicroC patterns between the blastoderm stage and stage 12-16 embryos.  While these minor differences are clearly interesting, we didn’t discuss them in the text.   In all of experiments chromosomes are likely to be paired.  In NC14 embryos (the stage for visualizing eve stripes and the MicroC contact profiles in Fig. 2) replication of euchromatic sequences is thought to be quite rapid.  While homolog pairing is incomplete at this stage, sister chromosomes are paired.  In stage 12-16 embryos, homologs will be paired and if the cells are arrested in G2, then sister chromosome will also be paired.  So in all of experiments, chromosomes (sisters and/or homologs) are paired. However, since we don’t have examples of unpaired chromosomes, our experiments don’t provide any info on how chromosome pairing might impact MicroC/expression patterns.

      (3) "P > 0.01" appears several times. I believe the authors mean to report "P < 0.01".

      Fixed.  

      References for Response

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bing X, Ke W, Fujioka M, Kurbidaeva A, Levitt S, Levine M, Schedl P, Jaynes JB. 2024. Chromosome structure i: Loop extrusion or boundary:Boundary pairing? eLife.

      Blanton J, Gaszner M, Schedl P. 2003. Protein:Protein interactions and the pairing of boundary elements in vivo. Genes Dev. 17(5):664-675.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zincfinger-associated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk A, Kamalyan S, Mariasina S, Boyko K, Popov V, Maksimenko O, Georgiev P. 2020. Nterminal domain of the architectural protein ctcf has similar structural organization and ability to self-association in bilaterian organisms. Sci Rep. 10(1):2677.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):47-58.

      Fujioka M, Ke W, Schedl P, Jaynes JB. 2024. The homie insulator has sub-elements with different insulating and long-range pairing properties. bioRxiv. 2024.02.01.578481.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis‐regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Gaszner M, Vazquez J, Schedl P. 1999. The zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes Dev. 13(16):2098-2107.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Harris HL, Gu H, Olshansky M, Wang A, Farabella I, Eliaz Y, Kalluchi A, Krishna A, Jacobs M, Cauer G et al. 2023. Chromatin alternates between a and b compartments at kilobase scale for subgenic organization. Nat Commun. 14(1):3303.

      Hart CM, Zhao K, Laemmli UK. 1997. The scs' boundary element: Characterization of boundary element-associated factors. Mol Cell Biol. 17(2):999-1009.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539-553.e538.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Chetverina D, Maksimenko O, Kullyev A, Georgiev P. 2008. Orientation-dependent interaction between drosophila insulators is a property of this class of regulatory elements. Nucleic Acids Res. 36(22):7019-7028.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442. Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P. 2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab-7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Kyrchanova O, Zolotarev N, Mogila V, Maksimenko O, Schedl P, Georgiev P. 2017. Architectural protein pita cooperates with dctcf in organization of functional boundaries in bithorax complex. Development. 144(14):2663-2672.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Muravyova E, Golovnin A, Gracheva E, Parshikov A, Belenkaya T, Pirrotta V, Georgiev P. 2001. Loss of insulator activity by paired su(hw) chromatin insulators. Science. 291(5503):495498.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Zolotarev N, Fedotova A, Kyrchanova O, Bonchuk A, Penin AA, Lando AS, Eliseeva IA, Kulakovskiy IV, Maksimenko O, Georgiev P. 2016. Architectural proteins pita, zw5,and zipic contain homodimerization domain and support specific long-range interactions in drosophila. Nucleic Acids Res. 44(15):7228-7241.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.<br /> (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary

      The authors investigated the antigenic diversity of recent (2009- 2017) A/H3N2 influenza neuraminidases (NAs), the second major antigenic protein after haemagglutinin. They used 27 viruses and 43 ferret sera and performed NA inhibition. This work was supported by a subset of mouse sera. Clustering analysis determined 4 antigenic clusters, mostly in concordance with the genetic groupings. Association analysis was used to estimate important amino acid positions, which were shown to be more likely close to the catalytic site. Antigenic distances were calculated and a random forest model was used to determine potential important sites.

      This has the potential to be a very interesting piece of work. At present, there are inconsistencies in the methods, results and presentation that limit its impact. In particular, there are weaknesses in some of the computational work.

      Strengths

      (1) The data cover recent NA evolution and a substantial number (43) of ferret (and mouse) sera were generated and titrated against 27 viruses. This is laborious experimental work and is the largest publicly available neuraminidase inhibition dataset that I am aware of. As such, it will prove a useful resource for the influenza community.

      (2) A variety of computational methods were used to analyse the data, which give a rounded picture of the antigenic and genetic relationships and link between sequence, structure and phenotype.

      Weaknesses

      (1) Inconsistency in experimental methods

      Two ferret sera were boosted with H1N2, while recombinant NA protein for the others. This, and the underlying reason, are clearly explained in the manuscript. The authors note that boosting with live virus did not increase titres. Nevertheless, these results are included in the analysis when it would be better to exclude them (Figure 2 shows much lower titres to their own group than other sera).

      As an exercise, we have excluded the H1N2 boosted ferrets sera and no major impact was observed in the antigenic grouping (see Author response image 1a). Another way to control for differences in immunogenicity is to normalize the NAI values with the homologous ELISA titers for each antigen. Clustering based on these ELISA normalized NAI titers reveals the same 4 distinct antigenic groups but with one change: Kan17 is shifted from group 1 to group 2 (Author response image 1b). Note that a homologous ELISA titer is not available for A/West-Virginia/17/2012 and thus this serum sample is not included in Author response image 1b.

      Author response image 1.

      Antigenic and phylogenetic relatedness of N2 NAs. Phylogenetic tree based on the N2 NA head domain amino acid sequences and heat-map representing the average of normalized neuraminidase inhibition titer per H6N2 [log2 (max NAI/NAI)] determined in ferret sera after the boost (listed vertically). The red-to-blue scale indicates high-to-low NAI observed in ELLA against the H6N2 reassortants (listed at the bottom). UPGMA clustering of H6N2s inhibition profiles are shown on top of the heat map and colored according to the phylogenetic groups.(a) Based on the ferret sera with exclusion of the sera that were obtained following prime-boost by infection with H1N2 (A/Estonia/91625/2015 and A/Stockholm/15/2014). (b) Based on serum NAI titers that were normalized by the homologous ELISA titer.

      (2) Inconsistency in experimental results

      Clustering of the NA inhibition results identifies three viruses which do not cluster with their phylogenetic group. Again, this is clearly pointed out in the paper. Further investigation of this inconsistency is required to determine whether this has a genetic basis or is an experimental issue. It is difficult to trust the remaining data while this issue is unresolved.

      We understand the concern of the reviewer. It is important to keep in mind that discrete grouping of antigens allows to visualize major antigenic drifts. However, within closely related groups the cross reactivity of antisera is more likely distributed in a spectrum. When we constructed an antigenic map based on the antigenic cartography algorithm (as described by Smith D. et al, 2004), Kansas17, Wis15, and Ala15 are positioned more closely to antigenic group 1 than the majority of other antigens that were classified as group 2 (Author response image 2a). Similar results were obtained when individual ferret sera from the biological duplicates were used (Author response image 2b). This antigenic cartography map is now added as Figure 2. Figure supplement 3 to the revised manuscript.

      Author response image 2.

      The antigenic cartography was constructed using averaged data from pairs of ferrets (a). Similar analysis was performed on individual ferrets sera (b).

      (3) Inconsistency in group labelling

      A/Hatay/4990/2016 & A/New Caledonia/23/2016 are in phylogenetic group 1 in Figure 2 and phylogenetic group 1 in Figure 5 - figure supplement 1 panel a.

      Our apologies: there was indeed a mistake in labeling of Figure 5. A new antigenic cartography was constructed and included in the revised manuscript. As a result Figure 5 - figure supplement has now become redundant and was removed from the manuscript.

      A/Kansas/14/2017 is selected as a representative of antigenic group 2, when in Figure 2 it is labelled as AC1 (although Figure 2 - supplement 4 which the text is referring to shows data for A/Singapore/Infimh-16-0019/2016 as the representative of AC2). A/Kansas/14/2017 is coloured and labelled as AC2 in Figure 2 - supplement 5.

      Thank you for pointing out this inconsistency. Kan17 clustered antigenically in group 1 based on the NAI values that were normalized relative to the serum with the maximal NAI value against the H6N2 virus that was tested. When using NAI titers that are normalization with the homologous ELISA titer, Kan17 is positioned in group 2. Likewise, antigenic cartography mapping positions Kan17 in group 2. Therefore, we conclude that A/Kansas/14/2017 NA is a representative of group 2.

      The colouring is changed for Figure 3a at the bottom. A/Heilongjiang-Xiangyang/1134/2011 is coloured the same as AC4 viruses when it is AC1 in Figure 2. This lack of consistency makes the figures misleading.

      We apologize for this mistake. The coloring in Figure 3a has been corrected.

      (4) Data not presented, without explanation

      The paper states that 44 sera and 27 H6N2 viruses were used (line 158). However, the results for the Kansas/14/2017 sera do not appear to be presented in any of the figures (e.g. Figure 2 phylogenetic tree, Figure 5 - figure supplement 1). It is not obvious why these data were not presented. The exclusion of this serum could affect the results as often the homologous titre is the highest and several heatmaps show the fold down from the highest titre.

      Serum against A/Kansas/14/2017 was not prepared. For that reason, it is not included in the analysis. We agree that such homologous serum ideally should have been included and in the NAI assay would have resulted in a high if not the highest titer. However, we noticed that homologous sera did not always have the highest titers, especially in panels like ours were some antigens are closely related. The highest titer obtained against Kan17 H6N2 was from A/Bris/16 sera: 1/104, a titer that is in the range of other, homologous titers observed in the panel (Table S3). The Bris16 and Kan17 NAs have five amino acid differences. In summary, inclusion of Kan17 homologous sera would likely not impact the analysis and interpretation of the results because there are multiple highly cross-inhibiting heterologous serum samples against Kan17.

      (5) The cMDS plot does not have sufficient quality assurance A cMDS plot is shown in Figure 5 - figure supplement 1, generated using classical MDS. The following support for the appropriateness of this visualisation is not given. a. Goodness of fit of the cMDS projection, including per point and per titre. b. Testing of the appropriate number of dimensions (the two sera from phylogenetic group 3 are clustered with phylogenetic group 2; additional dimensions might separate these groups). c. A measure of uncertainty in positioning, e.g. bootstrapping. d. A sensitivity analysis of the assumption about titres below the level of detection (i.e. that <20 = 10). Without this information, it is difficult to judge if the projection is reliable.

      We agree with these comments. We have removed Figure 5 – figure supplement 1, and added new figure 2 – figure supplement 3 (antigenic cartography) instead.

      (6) Choice of antigenic distance measure

      The measure of antigenic distance used here is the average difference between titres for two sera. This is dependent on which viruses have been included in the analysis and will be biased by the unbalanced number of viruses in the different clusters (12, 8, 2, 5).

      To verify the impact of the number of antigens on our analysis, the matrix of differences was generated with only 4 H6N2s representing at least one phylogenetic group (Per09, Sin16, Hel823 and Ind11) (Author response image 3a). This matrix is very similar to the one calculated based on all 27 antigens (Author response image 3b). The obtained matrix (Author response image 3a) was used in random forest to model antigenic distances and the result of prediction was plotted against real differences calculated based on the full data. The correlation coefficient (R2) of predicted vs observed values dropped from 0.81 to 0.71, suggesting that the number of antigens tested does not drastically affect the antigenic differences calculated based on serum values (Author response image 3e). Importantly, amino acid substitutions potentially associated with increased antigenic distances are similarly identified (Author response image 3c, d and f).

      Author response image 3.

      Matrix of differences was calculated using only 4 H6N2 antigens (a) or the full panel (b). The matrixes from (c) 4 or (d) 27 antigens were used in random forest modeling to estimate the impact of amino acid changes, respectively. The rf modeling data generated from 4 H6N2 only was plotted and correlated with values calculated from the full panel of 27 H6N2s (e). The multi-way importance plot indicates in red that 7 out of the 10 most important substitutions were identified by the analysis using only 4 H6N2s (f).

      Interestingly, when matrix of differences is calculated using only 4 H6N2s data but not including at least one representative of antigenic group 1 and 2, the correlation coefficient between the predicted values and values obtained from the full panel is dramatically impacted (R2 values drops from 0.81 to 0.5 and 0.57. It is important to note that most of the sera also belong to phylogenetic antigens from groups 1 and 2. As a consequence, poorer prediction of those antigens would more drastically impact the correlation. No drastic drop was observed when representative H6N2s from group 3 or 4 were excluded from the data (from 0.81 to 0.75 and 0.73, Author response image 4 c and d).

      Author response image 4.

      Random forest analysis was repeated using only 4 antigens, but excluding representatives of one of the phylogenetic groups (a) no group 1, (b) no group 2, (c) no group 3, and (d) no group 4.

      We also used Euclidean distances as a measure of differences (Author response image 5). The predictive values obtained in rf have a slightly reduced R2 compared to the values obtained using average of differences.

      In conclusion the unbalanced number of antigens used per group and metric of distance does not seem to impact per se our analysis.

      Author response image 5.

      Antigenic distances were calculated using Euclidian distances of sera to sera. Those antigenic distances were used in rf for estimation of antigenic distance and importance of each amino acid substitution.

      (7) Association analysis does not account for correlations

      For each H6N2 virus and position, significance was calculated by comparing the titres between sera that did or did not have a change at that position. This does not take into account the correlations between positions. For haemagglutinin, it can be impossible to determine the true antigenic effects of such correlated substitutions with mutagenesis studies.

      Most of the potential correlated effects cannot be addressed with the panel of N2s, except for combinations of substitution that are included in the panel, such as 245/247 with or without 468. Only mutagenesis studies would shed light on the epistatic effects. However, it is important to keep in mind that those individual substitutions in such kind of study likely do not reflect natural evolution of N2 (cfr. the importance of the NA charge balance (Wang et al., 2021: 10.7554/eLife.72516).

      (8) Random forest method

      25 features are used to classify 43 sera, which seems high (p/3 is typical for classification). By only considering mismatches, rather than the specific amino acid changes, some signals may be lost (for example, at a given position, one amino acid change might be neutral while another has a large antigenic effect). Features may be highly, or perfectly correlated, which will give them a lower reported importance and skew the results.

      The number of features were optimized in the range from 5 to 80, with 25 being optimal (best R-value in predicted vs observed antigenic distances). Those features refer to the number of amino acid substitutions used in each tree. The number of trees was also optimized in the range of 100 to 2000.

      In random forest the matrix of differences is made considering only position based and not the type of substitution in pairs of NA. Indeed, substitutions with distinct effects may skew results by indicating lower reported importance.

      We have highlighted such potential bias in our discussion:

      “Also, our modelling does not consider that substitution by other amino acids can have a distinct impact on the antigenic distance. As a consequence, predictions based on the model could underestimate or overestimate the importance of a particular amino acid residue substitution in some cases.”

      Reviewer #2 (Public Review):

      Summary:

      The authors characterized the antigenicity of N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 using ferret and mice immune sera. Four antigenic groups were identified, which correlated with their respective phylogenic/ genetic groups. Among 102 amino acids differed by the 44 selected N2 proteins, the authors identified residues that differentiate the antigenicity of the four groups and constructed a machine-learning model that provides antigenic distance estimation. Three recent A(H3N2) vaccine strains were tested in the model but there was no experimental data to confirm the model prediction results.

      Strengths:

      This study used N2 protein of 44 selected A(H3N2) influenza A viruses isolated from 2009-2017 and generated corresponding panels of ferret and mouse sera to react with the selected strains. The amount of experimental data for N2 antigenicity characterization is large enough for model building.

      Weaknesses:

      The main weakness is that the strategy of selecting 44 A(H3N2) viruses from 2009-2017 was not explained. It is not clear if they represent the overall genetic diversity of human A(H3N2) viruses circulating during this time. A comprehensive N2 phylogenetic tree of human A(H3N2) viruses from 2009-2017, with the selected 44 strains labeled in the tree, would be helpful to assess the representativeness of the strains included in the study.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method calculates MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization of in a phylogenetic tree, only 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 6.

      The second weakness is the use of double-immune ferret sera (post-infection plus immunization with recombinant NA protein) or mouse sera (immunized twice with recombinant NA protein) to characterize the antigenicity of the selected A(H3N2) viruses. Conventionally, NA antigenicity is characterized using ferret sera after a single infection. Repeated influenza exposure in ferrets has been shown to enhance antibody binding affinity and may affect the cross-reactivity to heterologous strains (PMID: 29672713). The increased cross-reactivity is supported by the NAI titers shown in Table S3, as many of the double immune ferret sera showed the highest reactivity not against its own homologous virus but to heterologous strains. Although the authors used the post-infection ferret sera to characterize 5 viruses (Figure 2, Figure Supplement 4), the patterns did not correlate well. If the authors repeat the NA antigenic analysis using the post-infection ferret sera with lower cross-reactivity, will the authors be able to identify more antigenic groups instead of 4 groups?

      This is a very valuable remark. In their paper, Kosikova et al. (CID 2018) report that repeated infection of ferrets with antigenically slightly different H3N2 viruses results in a broader anti-HA response, compared to a prime infection of an influenza naïve ferret, which results in a narrower anti-HA response. In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Author response image 7). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      Author response image 7.

      Correlation obtained on NAI data from ferrets at day 14 after infection vs data from day 42 after boost.

      Another weakness is that the authors used the newly constructed model to predict the antigenic distance of three recent A(H3N2) viruses but there is no experimental data to validate their prediction (eg. if these viruses are indeed antigenically deviating from group 2 strains as concluded by the authors).

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 8 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      Author response image 8.

      Antigenic distances from Swe17 and HK17 calculated using the random forest algorithm that was constructed without experimental data from Swe17 and HK17. The predicted distances were plotted side by side to the experimental distances in (a) and correlations are shown in (b).

      Reviewer #3 (Public Review):

      Summary:

      This paper by Portela Catani et al examines the antigenic relationships (measured using monotypic ferret and mouse sera) across a panel of N2 genes from the past 14 years, along with the underlying sequence differences and phylogenetic relationships. This is a highly significant topic given the recent increased appreciation of the importance of NA as a vaccine target, and the relative lack of information about NA antigenic evolution compared with what is known about HA. Thus, these data will be of interest to those studying the antigenic evolution of influenza viruses. The methods used are generally quite sound, though there are a few addressable concerns that limit the confidence with which conclusions can be drawn from the data/analyses.

      Strengths:

      • The significance of the work, and the (general) soundness of the methods.

      • Explicit comparison of results obtained with mouse and ferret sera.

      Weaknesses:

      • Approach for assessing the influence of individual polymorphisms on antigenicity does not account for the potential effects of epistasis.

      Indeed, possible epistatic effects or individual polymorphisms were not assessed, which is limited by the nature of the panel of N2s selected in the study. We now emphasize this in the discussion as follows:

      “Also, our modelling does not consider that substitution by different amino acids can have distinct impact on antigenic distance. As a consequence, predictions based on the model could underestimate the importance of a particular amino acid residue substitution in some cases.”

      • Machine learning analyses were neither experimentally validated nor shown to be better than simple, phylogenetic-based inference.

      This is a valid remark and indeed we have found a clear correlation between NAI cross reactivity and phylogenetic relatedness. However, besides achieving good prediction of the experimental data (as shown in Figure 5 and in FigureR7), machine Learning analysis has the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. ML can also support the selection and design of broader reactive antigens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major corrections

      No major corrections, beyond the issues I touched on in the public review, for which I give a little more detail below:

      Point 2. If there's not a putative genetic basis for the unexpected clustering seen in the NAI, then reiterating a small subset of the data would show the reliability of the experimental methods and substantiate this unexpected finding.

      We thank the reviewer for this pertinent point and suggestion. We have modified our analysis by reiterating individual ferret data normalized with the homologous ELISA titers. This reiteration is shown in figure R1b. In this case both Kan17 and Wis15 are switched to antigenic group 2. The profile of sera inhibition against those 2 strains that shift from antigenic cluster 1 to 2, is clearly an intermediate between profiles observed in those 2 groups. Considering that antigenic evolution occurs gradually, it is not unexpected that those intermediate profiles would swing from one side to another when pushed to forced discrimination. Antigenic cartography mapping, as in Smith et al. (2004), also indicated that those H6N2s are located closer to G1 than overall antigens from G2. Raw data distribution (max and min EC50) also do not indicate potential bias in analysis.

      Point 5. If you want to use antigenic cartography (Smith et al 2004), there is the R CRAN package (https://CRAN.R-project.org/package=Racmacs) which can handle threshold titres (like <20) and has functions for the diagnostic tools I describe, in order to quality assure the resulting plot. It does use a different antigenic distance metric than the paper currently uses, so you might not want to take that route.

      Thank you for this suggestion. We have performed antigenic cartography using the methodology described by Smith et al made accessible by Sam Wilks. The outcome of this analysis has been added to the manuscript as Figure 2 – Figure supplement 3.

      Point 6. More robust measures of antigenic distance take into account the homologous titre, homologous and heterologous titres (Archetti & Horsfall, 1950) or use the highest observed titre for a serum (Smith et al 2004). A limitation of the first two is that the antigenic distance can only be calculated when you have the homologous titre, which will limit you as you only have this for 26/43 sera. They may give similar results to your average antigenic distance, in which case your analysis still stands. Calculating antigenic distance using the homologous or maximum titre only gives the antigenic distance between the antigen and the serum. If you want the distance between all the sera, then further analysis is required (making an antigenic map and outputting the serum-serum distances, see the point above).

      We thank the reviewer for these suggestions. A complete set of 43 H6N2 viruses that matches all 43 sera would have been ideal. This would require the generation of 17 additional H6N2 viruses and their testing in ELLA, a significant amount of work in terms of time and resources. Instead, we have generated an antigenic map of the 27 antigens and homologous sera (cfr. our response to point 5 above). Despite different methods the outcome showing 4 major antigenic groups is consistent.

      Minor corrections

      Table S1

      A/New_Castle/67/2016 should be A/Newcastle/67/2016

      A/Gambia/2012 is not the full virus name

      Corrected.

      Table S3 has multiple values of exactly 10.0. I think these should be <20 as they are below the threshold of detection for the assay.

      All the values lower than 20 in Table S3 were replaced by “< 20”.

      Line 376: A/Sidney/5/1997 should be A/Sydney/5/1997

      Corrected.

      Line 338: "25 randomly sampled data" is a bit vague, "25 randomly sampled features" would be better

      Corrected.

      Include RMSE of the random forest model.

      RMSE=19.6 RMSE/mean = 0.207 is now mentioned in the manuscript.

      Figure 5 - supplement 1: These plots are difficult to interpret as the aspect ratio is not 1:1, and panels a & b are difficult to compare as they have not been aligned (using a Procrustes analysis). It would be neater if they were labelled with short names.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Line 562: 98 variable residues, where it is 102 elsewhere in the text.

      There are 4 mutations near the end of the NA stalk domain, which are not resolved in the N2 structure. Therefore, amino acid distances to these residues cannot be calculated.

      No data availability statement. Some of the raw data is available in Table S3 and there is no link to the code.

      The data and code used for generation of rf modelling was uploaded to Github and made available. The following statement has been added to the manuscript: “The data and code used for the generation of the rf model is available at https://github.com/SaelensLAB/RF..”

      Reviewer #2 (Recommendations For The Authors):

      (1) More than 42,000 NA sequences are available for the mentioned period on GISAID, it is therefore important to understand the selection criteria for the 44 strains and if these strains represent the overall genetic diversity of N2 of human A(H3N2) viruses. To demonstrate the representativeness of the 44 selected strains, please construct a representative N2 phylogenetic tree for human A(H3N2) viruses circulated in 2009-2017 and label the 44 selected strains on the tree.

      The selection of antigens was performed using the method described by Bien and Tibshirani 2011 (doi: 10.1198/jasa.2011.tm10183). This method uses MinMax distances to identify a central representative among distinct clusters.

      To facilitate visualization tree only of 180 representative N2 proteins from 2009-2017 were randomly selected (20 strains per year, unlabelled). Those 180 representatives and 44 readout panel strains (labelled) are shown in the phylogenetic tree below. Readout strains cover the major branches of the tree. The tree has been built using PhyML 3.0 using JTT substitution model and default parameters (Guindon S. et al, Systematic Biology 59(3):307-21, 2010) and visualized using ETE3 (Huerta-Cepas J. et al, Mol. Biol. Evol 33(6):1635-38, 2016).

      Author response image 9.

      (2) Double immune ferret sera may increase antibody binding affinity and cross-reactivity against heterologous strains. Using single-infection ferret sera may yield different antigenic grouping results (eg. may identify more antigenic groups). Can the authors repeat the NA antigenic grouping using single-infection ferret sera? Although data from a subset of 5 strains was presented (Figure 2, Figure Supplement 4), the information was not sufficient to support if the use of single-infection or double immune ferret sera will yield similar antigenic grouping results.

      In our ferret immunizations the boost was performed with recombinant, enzymatically active NA that was homologous to the NA of the H1N2 virus that was used for the priming by infection. We determined the NAI responses in sera from ferrets after H1N2 infection against 5 different H6N2 viruses (Figure 2 – figure supplement 5). Compared to NAI responses in sera from H1N2 infected and subsequently NA protein boosted ferrets, the NAI titers obtained after a single infection were considerably lower. Although the normalized NAI titers of day 14 and day 42 sera correlated well, we cannot exclude a degree of broadening of the NAI response in the NA protein boost sera (Figure R6). On the other hand, repeated influenza antigen exposure is the reality for the majority of people.

      (3) NA antigenicity data is presented in heat maps and the authors would often describe the heat map patterns matches without further explanations. Line 234-235, the heat map of mouse sera (Figure 2. Figure supplement 5) was described to match the results of ferret sera (Figure 2), but this tends to be subjective. A correlation analysis of 7 selected antigens showed a positive correlation, what about the other 37 antigens?

      The interpretation of heatmaps is indeed very subjective, for this reason the correlation of the 7 selected antigens was also provided. The other 37 antigens were not tested. Considering the results using post boost sera, a simulation of using random forest modeling indicate that the data from one antigen of each antigenic group is sufficient to achieve a reliable predictive output (R2=0.71) (Figure R3 of this rebuttal).

      (4) Can the authors explain in more detail how data in Figure 4a was generated? According to the authors, residues close to the catalytic pocket are more likely to impact NAI. Can the authors explain how they define if a residue is close to the catalytic pocket?

      The correlation of distances of amino acid residues with significance values is explained as follows. Consider 7 distinct elements that are distributed horizontally as shown by the squares in the figure below (Author response image 10a). The elements highlighted in yellow have a numerical propriety (in case of N2 neuraminidase this was the significance values obtained in the association study). Taking P1 as reference we can calculate the distance (red arrows) between P1 and P2, P4 and P7, those distances can them be correlated to intrinsic values of P2, P4 and P7, which enables the calculation of the correlation coefficient Tau. This same process is repeated for each position (or each amino acid), as a consequence every position will have a correlation coefficient calculated (Author response image 8b). This correlation coefficient can be represented as a heat map at the surface of N2.

      Author response image 10.

      The 2D scheme represents the strategy used to calculate the correlation (i.e. the Tau values) between distances and p-values. Tau values can then be presented in a heat map.

      (5) Can the authors provide experimental data using the three recent A(H3N2) viruses as antigens and perform NAI assay to confirm if they are antigenic all deviating from group 2 viruses?

      The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in Author response image 7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively.

      (6) According to Ge et al. 2022 (PMID: 35387078), N2 NA's before 2014 (2007-2013) showed a 329-N-glycosylation and E344, and they were subsequently replaced by H3N2 viruses with E344K and 329 non-glycosylation changing the NI reactivity in ferret antisera towards later strains. Were these residues also predicted to be important to N2 antigenicity from your machine-learning method?

      Three of the N2 NAs used in our panel, A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in another 3 NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted in our modeling. However, the differences in NAI reported by Ge et al. are low (not even twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI. We have added the following to the discussion in our revised manuscript:

      “It has been reported that an N-glycosylation site at position 329 combined with E344 in NA from human H3N2 viruses from 2007 to 2013 was gradually lost in later H3N2 viruses (Ge et al., 2022). This loss of an N-glycosylation site at position 329 combined with an E344K substitution was associated with a change in NAI reactivity in ferret sera. Three N2 NAs in our panel, derived from A/Victoria/361/2011, A/Hong_Kong/3089/2017, and A/Tennessee/18/2017, lack this N-glycosylation motif. The E344K substitution is present in three other NAs, derived from A/Nagano/2153/2017, A/Minnesota/11/2010, and A/Indiana/08/2011. The importance of those mutations is among the lowest ones predicted by our modeling. However, the differences in NAI reported by Ge et al. are very modest (lower than twofold). The experimental variability in our study potentially limits the identification of substitutions with a subtle impact NAI.”

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions:

      Line 132: Did the authors confirm the absence of compensatory mutations due to a heterologous H6 background that could potentially confound downstream NAI results?

      All NAs genes of the rescued H6N2 viruses were fully sequenced and were found to be identical to the expected NA sequences, with the only exception being the A/Tasmania/1018/2015 were a mixed population of wt and M467I was found. This substitution is located at the surface and at the top of the NA head domain, and thus could potentially impact NA antigenicity. However, A/Tasmania/1018/2015 H6N2s had a similar inhibition profile as other H6N2s in phylogenetic and antigenic group 1. This indicates that, at least in this mixed population, antigenicity was not drastically affected by the M467I substitution.

      Line 96: how do these data rule out variation in the fraction of properly folded protein across NAs? They certainly show that properly folded NA protein is present, but not whether amounts vary between the different NAs.

      SEC-MALS (size exclusion chromatography-Multiangle light scattering) data and enzymatic activity were considered as a proxy for correctly folded NA. Although the specific activity of the recombinant N2 NAs is expressed per mass unit (microgram), we cannot exclude that the fraction of properly folded protein across the different recombinant NAs may vary.

      Lines 262-269: this analysis approach (based on my reading) seems to consider each polymorphism in isolation and thus does not seem well suited for accounting for epistatic interactions within the NA. For example, the effect of a substitution on NAI may be contingent upon other alleles within NA that are not cleanly segregated between the two serum comparator groups. Can the authors address the potential of epistasis within NA to confound the results shown in Figure 3?

      Unfortunately, epistatic interactions cannot be solved using the panel of N2 selected for the study. This limitation is mentioned in our discussion:

      “It is important to highlight that co-occurring substitutions in our panel (the ones present in the main branches of the phylogenetic tree) cannot be individually assessed by association analysis or the random forest model. The individual weight of those mutation on NA drift thus remains to be experimentally demonstrated.”

      Line 331: is there a way to visualize and/or quantify how these two plots (F5 supplement 1a/b) reflect each other or not? Without this, it is hard to ascertain how they relate to each other.

      We have generated an antigenic cartography map instead. As a consequence, the MDS has become redundant and Figure 5 – supplement 1 was removed.

      Figure 4B structural images are not well labelled.

      The active site in 1 of the protomers is now indicated with an arrow in the top and side views of the NA tetramer.

      Lines 339-359: the ML predictions are just predictions and kind of meaningless without experimental validation of the predicted antigenic differences between recent NAs. This section would also be strengthened by an assessment of whether the ML approach obtains more accurate results than simply using phylogeny to predict antigenic relationships.

      Indeed, there is no experimental data from A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021. The generation of data to determine experimental values for A/Hong_Kong/45/2018, A/Tasmania/503/2020, or A/Darwin/9/2021 would require the generation of new reassortant viruses (H1N2s), recombinant protein and immunization of new ferrets. The ferrets sera would have to be analyzed against all 27 H6N2s, including duplicated control sera for normalization. The major point of the modeling was to evaluate if it is possible to predict the antigenic behavior based on amino acid substitutions.

      As an exercise we have run the model again but this time excluding the Swe17 and HK17 antigens from the data set. Sequences of Sw17 or HK17 were then used to predict antigenic distances. The modeled versus experimental data are plotted in figure R7 and show a robust predictive outcome with R2 values of 0.94 and 0.91 for Sw17 and HK17, respectively. A major advantage of antigenic modeling is the potential to rank or indicate major antigenic divergences based on available sequences before it has consolidated as new clade. The support in selecting or designing broader reactive antigens is another advantage of machine learning analysis.

      Lines 416-421: appreciate the direct comparison of results obtained from ferrets versus mice.

      We thank the reviewer for expressing this appreciation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Tesmer and colleagues uses fiber photometry recordings, sophisticated analysis of movement, and deep learning algorithms to provide compelling evidence that activity in hypothalamic hypocretin/orexin neurons (HONs) correlates with net body movement over multiple behaviors. By examining projection targets, the authors show that hypocretin/orexin release differs in projection targets to the locus coeruleus and substantia nigra, pars compacta. Ablation of HONs does not cause differences in the power spectra of movements. The movement-tracking ability of HONs is independent of HON activity that correlates with blood glucose levels. Finally, the authors show that body movement is not encoded to the same extent in other neural populations.

      Strengths:

      The major strengths of the study are the combination of fiber photometry recordings, analysis of movement in head-fixed mice, and sophisticated classification of movement using deep learning algorithms. The experiments seem to be well performed, and the data are well presented, visually. The data support the main conclusions of the manuscript.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      The weaknesses are minor, mostly consisting of writing and data visualization throughout the manuscript. To some degree, it is already known that hypocretin/orexin neurons correlate with movement and arousal, although this manuscript studies this correlation with unprecedented sophistication and scale. It is also unfortunate that most of the experiments throughout the study were only performed in male mice. Taken together, this study is likely to be impactful to the field and our understanding of HONs across behavioral states.

      We agree that disentangling movement from arousal is an important aspect, and in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity). In addition, we now implement many of the reviewer’s recommendations regarding writing, data presentation, and visual clarity (see our replies in the “recommendations for authors” section).

      Reviewer #1 (Recommendations for the authors):

      Some recommendations for the authors:

      (1) The first sentence of the Introduction states: "Neural activity related to body movement recently received much attention." I would rephrase or clarify this statement, as neuroscientists have been studying neural activity related to body movement for decades.

      The reviewer is correct. Our intention was to highlight the resurgence of movementrelated neurosciences enabled by modern techniques such as deep learning applied to video data (e.g. DeepLabCut, etc). The passage has been updated for clarity.

      (2) The Introduction also states that HONs orchestrate "consciousness and arousal." I would delete the word "consciousness," as consciousness represents a lofty, global concept that is challenging to define and quantify in humans, let alone mice.

      We used the word consciousness to be consistent with current literature on the function of the mouse hypothalamus (e.g. Nat Neurosci 2016 Feb;19(2):290-8). But we agree it is not necessary here, and so we followed the advice to delete it.

      (3) The authors state that HON dynamics were recorded while mice were head-fixed while on a running wheel. For clarity, it would be helpful to visualize this head-fixation in Figures 1A and 5B. It would also be helpful to clarify how certain behaviors (e.g. grooming, chewing) were performed and recorded while the mouse was head-fixed.

      In the revised manuscript, updated graphics with a head-fixed mouse have now been added to relevant figures. Representative RGB frames (colors representing sequential frames) of each behaviour have been added to Figure 2A.

      (4) In the legend for Figure 1A, the reference to Gonzalez et al. 2016 seems out of place (at least the reader should be informed why the text is referring to this previous study). Additionally, because the references are ordered by number instead of alphabetically, it would be more helpful to refer to a numbered reference rather than a name.

      Gonzalez et al. 2016 references the source of the AAV construct used in this figure. This has been moved to the methods. Following eLife formatting guidelines, references will be alphabetized upon publication.

      (5) In Figure 3F, it would be helpful to show visual validation that the HON-DTR method indeed ablates all HONs. This is depicted conceptually, but representative figures would be much more convincing.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B.

      Reviewer #2 (Public review):

      Summary:

      Despite several methodological strengths, the major and highly significant drawback is the confound of arousal with movement. This confound is not resolved, so the results could be explained by previously established relationships between orexin and arousal/wakefulness.

      This an excellent point, and we agree. To address this directly in the revised manuscript, we now include new data and analyses towards this (pupillometry to directly assess arousal, and multivariate analysis to assess contributions of arousal vs movemement to HON activity).

      Strengths:

      The authors show that orexin neuron activity is associated with body movement and that this information is conveyed irrespective of the fasted state. They also report differences in different orexin target brain regions for orexin release during movement. This paper contains an impressive array of cutting-edge techniques to examine a very important brain system, the orexin-hypocretin system. The authors offer an original perspective on the function of this system. The authors showed that orexin neuron activity scales to some degree with the magnitude of body movement change; this is unaffected by a fasted state and seems to be somewhat unique to orexin neurons.

      The investigation of other genetically defined subcortical neuron populations to determine the specificity of findings is also a strength, as is the ability to quantify movement and use deep learning to classify specific behaviors adds sophistication to analysis. The authors also show heterogeneity in orexin projections to specific target nuclei, which is interesting.

      The authors "speculate that narcolepsy-cataplexy, caused by HON loss-of-function, is perhaps explained by oscillations into unwanted sleep-states and motor programs due to impaired control loops for wakefulness and movement". This is quite an interesting aspect of their work and deserving of further study.

      We thank the reviewer for their supportive feedback.

      Weaknesses:

      Despite the strengths, there are several major and minor weaknesses that detract significantly from the study.

      My main concern with this work is the confound of arousal with movement so that correlations with one might reflect a relationship instead with the other. The orexin system is well known to play an important role in arousal, with elevated activity of orexin neurons reported for waking and high arousal. Orexin signaling has also been strongly associated with motivation, which also is associated with arousal and movement. The authors offer no compelling evidence that the relationships they describe between different movements and orexin signaling do not simply reflect the known relationship between arousal and motivation.

      The authors could address this concern by including classical arousal measurements, eg, cortical EEG recorded simultaneously with movements. Often, EEG arousal occurs independently of movement, so this could provide one approach to disentangling this confound. The idea that orexin signaling plays a role in arousal rather than movement is supported by their finding that orexin lesions using the orexin-DTR mouse model did not impact movements. In contrast, prior lesion and pharmacologic studies have found that decreased orexin signaling significantly decreases arousal and waking.

      Another way they could test their idea would be to paralyze and respirate animals so that orexin activity could be recorded without movement. Alternatively, animals could be trained to remain motionless to receive a reward. Thus, there are several ways to test the overall hypothesis of this work that have not been examined here.

      The authors propose that "a simple interpretation of their results is that, via HON movement tracking, the brain creates a "wake up" signal in proportion to movement". This seems to argue for the role of the orexin system in arousal and motivation rather than in movement per se.

      Thank you. We agree that disentangling between arousal and movement is indeed critical. A classic approach is a multivariate analysis, wherein multiple simultaneously recorded “predictors” of HON activity – such as arousal and movement - can be directly compared. While EEG arousal is an option, another well-accepted metric for arousal is pupil diameter. Using n = 7 mice, we now simultaneously record HON activity, movement, running speed, pupil size fluctuations, and ocular movements:

      We then fit a partial least squares multivariate regression (a regression type more robust to collinearity) using the movement metric, pupil size, and ocular movements as predictors of orexin neuron activity. Consistent with previous publications, we found that pupil size alone has a positive correlation with hORX.GCaMP6s (~0.45). However, using a drop-one feature analysis in multivariate regression, we found that movement had the highest % contribution to statistically explaining orexin neuron activity. Here are the new results (which we now added as Fig. 7A-B).

      Author response image 1.

      Furthermore, we also expanded this analysis to incorporate the different frequencies found in HON dynamics, using empirical mode decomposition. We found that pupil size had a maximum correlation at lower HON frequencies than the movement metric, while ocular movements were maximally correlated in higher frequencies (now added as Fig. 7D,E).

      Overall, this analysis suggests that – while HONs encode both movement and arousal – arousal and movement do not always co-fluctuate at the same timescales, and their impacts on HONs can be disentangled in a number of ways. We now mention this in revised text on page 5.

      There are several studies that have examined the effect of orexin antagonist treatment in rodents on locomotor and other motor activities. These studies have largely found no consistent effect of antagonizing orexin signaling, especially at the OxR1 receptor, on simple motor activity. These studies are not referenced here but should be taken into account in the authors' conclusions.

      We agree. Prior studies found that orexin antagonism – or optogenetic silencing of HONs – evokes either reduced locomotion, or no effect on locomotor movements. We now added text and references to paragraph 4 of Discussion, summarising this.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture of HONs ablation is necessary, including pictures of HONs outputs ablation within the SNc and LC.

      A representative histological slice is now included for both wild type (WT) and HON-DTR mice in the new Figure 4B. Because HONs are only found in the hypothalamus, somatic deletion of HONs in this region will result in axonal degradation in output regions.

      The discussion lacks a more extensive paragraph on the distinct signal and role of Ox>SNc and Ox-LC projections.

      We now added sentences discussing potential implications of this to Discussion (middle of paragraph 4).

      Reviewer #2 (Recommendations for the authors):

      Minor weaknesses

      A very important movement in rodents is head orientation, especially given the limitation in ocular movement. However, this paper used a fixed head model which obviated this movement and did not attempt to analyze ocular movements.

      Analysing ocular movements is something we had not considered but is very easy to check using pupillometry. In n = 7 mice, we recorded both orexin neurons, and ocular movements captured through an infrared camera under constant lighting. Ocular movements had a small positive correlation with orexin neuron photometry (r = ~0.26). See response to the public review above.

      Author response image 2.

      The "HON" abbreviation is not commonly used for orexin neurons, and I suggest replacing that with a more well-known abbreviation.

      To the best of our knowledge, there is no universally agreed or best-known abbreviation for hypocretin/orexin neurons (we agree it would be nice if there was one!). “HONs” is a simple first letter abbreviation of hypocretin/orexin neurons, which acknowledges the two names for this peptide given by the original discoverers (de Lecea et al, and Sakurai et al, in 1998). Although this may not be the perfect abbreviation, we have kept it for now, also to be consistent with the large number (>10) of other published studies that recently used this abbreviation.

      The graphs showing Pearson's r values do not demonstrate a very strong correlation between neural activity and movement change; they also lack validation of genetic expression/ablation in some cases. The results would more strongly support the conclusions if statistically significant correlations could be demonstrated between activity and movement.

      We agree that a correlation of ~0.68 is probably not worthy of a “very strong” classification. While there is no universal ruleset for categorizing the strength of a correlation, we have toned down our language throughout the manuscript.

      Comment regarding statistical testing of correlations: we are cautious to stand behind correlation significance testing for large sample sizes (~48’000 photometry & video samples in a 40-minute session). In our case, correlations were always extremely significant p<0.0001. The reason for this is that correlation p-values become “too big to fail” (see Lin et al. 2013) with inflated sample size. We therefore refrain from commenting on p-values and rather report between or within-subjects statistical tests, or tests against zero. See four example experiments below.

      Author response image 3.

      Citation: Lin, M., Lucas, H. C., Jr & Shmueli, G. Research Commentary—Too Big to Fail: Large Samples and the p-Value Problem. Information Systems Research 24, 906–917 (2013).

      The rationale for looking at running speed, general movement, and specific types of nonlocomotor movements could be clarified and explained more thoroughly in the introduction. Why is it important to distinguish between locomotion (represented here with running) and all other movements? Presumably, this is because orexin is known to regulate arousal/locomotion. What evidence is there for orexin's role in other types of movements, which are being grouped together in Figure 1? This could be laid out in more detail in the Introduction. Relatedly, it is not very clear in the text whether the correlation between movement and orexin neuron activity includes movement related to running.

      The main focus of our paper is on movement in general (i.e. video pixel difference, described in Results and Methods). This movement metric includes everything captured by the video, it is agnostic to the type of movement or behaviour.  To connect this to some of the specific innate movements/behaviours typically studied in mouse literature (running, grooming, sniffing, etc), we also performed plots in Figure 2. We attempted to explain this better in revised section 1 of Results.

      What exactly is being correlated in Figure 1C (and throughout the rest of the paper?) Is this the average signal correlated with the average movement change over the entire recording time? This could be more explicitly stated in methods/results. The correlations themselves/p-values could be shown in addition to/instead of Pearson's r values. Are the correlations themselves significant? This would strengthen the claim that orexin activity is strongly coupled to the magnitude of body movement change. As another example, in Figure 2D, there are no statistics reported on the correlation between movement metric and average neural signal. In Figure 6G, orexin neuron activity is more strongly correlated with movement than MVe glut neurons, but are either of these correlations significant? The correlation between MVe glut activity and movement overall seems similar to that of orexin neurons, and may be worth noting more explicitly.

      Throughout the paper, we have recorded both neural activity (photometry) and movement at 20 Hz. This would generate, for example, 48’000 samples of photometry and movement from a 40-minute session. All the samples were used to calculate a pearson’s r between variables. To clarify this, we now added the subtext “wholesession” to relevant figures, as well as a clarification in the methods.

      Individual experiment correlations for orexin neurons and MVe glut neurons were always significant p<0.0001, even after a Bonferroni multiple comparisons correction was applied to each population. See the “too big to fail” nature of correlation hypothesis testing above.

      It could be made clearer at the end of Figure 2 that orexin neuron activity is tracking the magnitude of movement change (shown in Figure 2D), not that it is encoding different types of movement.

      We intended for original Figure 2E to illustrate this concept, however this panel has caused a great deal of confusion to several readers and was perhaps ill conceived. We have replaced Figure 2E with a new panel more directly addressing the reviewer’s statement. We can construct three models where orexin neuron activity is predicted from the behavioral classification (sometimes called “one-hot” encoding) and/or the movement metric.

      Model 1 predicts orexin neuron activity using only a categorical predictor of behavioral state. Model 2 only uses the movement metric, and model 3 allows a different movement-metric correlation within each behavioral state. We can compare these models using AIC (Akaike Information Criterion) which is a point estimate. While the most complex model 3 was the best, model 2 was much closer to model 3 than model 1. Similarly, model 2 was much better than model 1. From this we conclude that the magnitude of movement change is a more powerful predictor than behavioral state (“type of movement”). This is now Figure 2E.

      It would be interesting to see the raw movement metric data as shown in Figures 1 and 2 in the DTR mice to show that ablating orexin neurons does not impair the movement profile seen in Figures 1 and 2.

      The requested visualization has been added to Figure 4B.

      Validation that orexin was selectively ablated in these mice would be ideal.

      Histology (see response to public review) was added to a new Figure 4B.

      Figure 4A - OxLight expression in SNc does not look very robust.

      Please note this is a membrane-targeted indicator, the staining this produces is thus much weaker than cyctosolic indicators such as calcium indicator GCaMP.

      Figure 4 - It would be beneficial to see the same correlations that were done in Figures 1 and 2 to show OxLight activity vs. movement metric. Are they correlated?

      Individual traces had significant correlations with OxLight and movement, and the population averages revealed similar trends:

      Author response image 4.

      Figure 6B - Targeting of MVe neurons does not look very specific. The sample size for orexintargeted mice should be re-stated in the figure legend for clarity.

      Legend has been updated to clarify n = 15 for orexin targeted mice.

      Some citations didn't seem to match what was being referenced in the text. Similarly, in the legend for Figure 1C, the statistics do not match what is reported in the text. In Figure 1, the sample size is not noted in the text. When referring to running in Figure 1, is this referring to running speed? Perhaps the language could be more consistent.

      These typos (due to a rounding error) in the legend and text have been corrected. Sample size has been added to the text, and we have changed Figure 1D to clarify we are referring to running speed. We moved some citations to improve clarity.

      Methods - where were Cre mice obtained from?

      Sources now better referenced in Methods (JAX or Parlato et al).

      Figure 1, panel C: The authors compared Pearson's r-coefficient results for each animal and for each variable. However, it would be interesting to show the correlation curves for each variable. However, it would be interesting to show the correlation curves for each variable as well here. Also, there is mention of a strong correlation but it is unclear whether these correlations are significant.

      See below for an example mouse.

      Author response image 5.

      Figure 3, panel F: I understand HON-DTR is a validated model but a picture orexin ablation is necessary, including pictures of orexin fibers ablation within the SNc and LC.

      See our reply to the public review above.

      Figure 5, Panel A: Same comment as Figure 1, panel C.

      We have similarly clarified the panel and legend.

      Page 4: The authors mention "Within the 1st and 4th quartile of blood glucose, movement-HON correlations were not significantly different. Please add the figures.

      The requested plot has been added to Figure 6, panel G.

      Reviewer #3 (Public review):

      Summary

      The study presents an investigation into how hypothalamic orexin neurons (HONs) track body movement with high precision. Using techniques including fiber photometry, video-based movement metrics, and empirical mode decomposition (EMD), the authors demonstrate that HONs encode net body movement consistently across a range of behaviors and metabolic states. They test the ability of HONs to track body movement to that of other subcortical neural populations, from which they distinguish HONs activity from other subcortical neural populations.

      Strengths:

      The study characterizes HONs activity as key indicators of movement and arousal, and this method may have potential implications for understanding sleep disorders, energy regulation, and brain-body coordination. Overall, I think this is a very interesting story, with novel findings and implications about sensorimotor systems in animals. The manuscript is clearly written and the evidence presented is rigorous. The conclusions are well supported by experimental data with clear statistical analyses.

      We thank the reviewer for their supportive feedback.

      Weaknesses/suggestions:

      There are a couple of issues I think the authors could address to make the paper better and more complete:

      (1) The study primarily focuses on steady-state behaviors. It would be interesting if the authors' current dataset allows analyses of HON dynamics during transitions between behavioral states (e.g., resting to running or grooming to sniffing). This could provide additional insights into how HONs adapt to rapid changes in body movement.

      This is a fantastic idea, and easy to check using our classification CNN. We identified the six most frequent behavioral transitions and plotted them in Figure 2H. HONs show rapid dynamics in activity aligned with behavioral changes.

      These changes are very similar to the movement magnitude along these transitions, which is now also plotted in Figure 2G.

      (2) Given the established role of HONs in arousal and wakefulness, the study could further investigate how movement-related HON dynamics interact with arousal states. For example, does HON encoding of movement differ during sleep versus wakefulness?

      To further investigate how movement encoding interacts with arousal, we now include quantification and analysis of pupil-linked arousal (see new Figure 7). We agree it would be interesting to look at what happens during sleep, especially REM sleep when some HONs are thought to be active where there is no/little body movement, but this is beyond the scope of the present study.

      (3) Although HON ablation experiments suggest that HONs do not shape movement frequency profiles. It would be more compelling if the authors could investigate whether HONs contribute to specific types of movements (e.g., fine motor vs. gross motor movements) or modulate movement initiation thresholds.

      We performed this analysis using the k-means classifier for small/large movements. Consistent with previous results, we found no significant effect (p = 0.2767) of genotype on the frequency of identified small (fine) or large (gross) movement clusters. This plot has been added to Figure 4E.

      (4) The heterogeneous movement-related orexin dynamics observed in the LC and SNc raise intriguing questions about the circuit-level mechanisms underlying these differences. Optogenetic or chemogenetic manipulation of these projections could validate the functional implications of these dynamics.

      We agree. We now discuss some implications of this in revised Discussion (paragraph 4). Please note that previous work already demonstrated that orexin action in the SNc can produce locomotion (referenced in the paragraph), though we agree that further work would be valuable.

      Reviewer #3 (Recommendations for the authors):

      Additional feedback:

      (1) Figure 1C: the individual data points are hard to track or see. Consider using a larger marker face to help data visualization. Similar issues can be found in Figures 2C, 2E, 5E, 6C, 6F, and 6G.

      Thickness of the lines and scatterplots have been increased.

      (2) First Section of Results: the authors claim to use a deep-learning network to automatically classify video recordings into five distinct behaviors. However, several issues need to be addressed here:

      a. In Results, the corresponding sentence lacks a reference to the Methods Section.

      Reference has been added to the text.

      b. In Methods, the description of the CNN model is quite limited, lacking many basic, necessary components including necessary references to published papers, the model training, characterization (only an overall accuracy is not enough), as well as dataset definition, preparation, augmentation (if any), etc.

      We have expanded the methods section regarding the CNN model.

      (3) First Section of Results: in the second paragraph, the authors claim that "Overall, these results reveal HON population activity precisely tracks a general degree of body movement across recorded behaviors." This is not accurate. To indicate that HONs activity tracks the general degree of body movement across behavior states, they need to further show that behavioral states with similar levels of movement metrics can be differentiated via HON activities. However, as they showed in Figure 2D, some behaviors with similar values of movement metric do not seem to be easily discerned by HON activity levels.

      We agree with you, and this is also what we originally intended to convey – now reworded for clarity.

      (4) Technical issue: Figures 3B, 3C, 3G, using local regression to plot the solid lines makes them touch negative values, which does not make sense for "power proportion" (this quantity is always non-negative).

      This is a good point. To fix this, we first log-transformed the power metric, then performed a local regression, and used the link function to transform the model predictions back to %-units for visualization. This has been noted in the methods.

      (5) Figure 3G: For a better comparison, consider combining the two plots into a single plot.

      The two plots have been merged as shown in Figure 4C.

      (6) Figure 5E: For a better data visualization, the current pair of plots can be consolidated into one single plot where the x-axis is Move and the y-axis is dGlu. In this way, it is easier to understand and the orthogonality as claimed in the manuscript can be more apparent.

      The requested plot has been added as Figure 6F.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful consideration of our work, including both reviewers’ constructive comments. Our apologies for taking some extra time for this revision, but we wanted to adress comments thoroughly with new analyses, not to mention a PhD defense, parental leave and my teaching ultimately being the bottleneck for the team’s work!

      Reviewer #1 (Public Review):

      The authors use a combination of structural and MD simulation approaches to characterize phospholipid interactions with the pentameric ligand-gated ion channel, GLIC. By analyzing the MD simulation data using clusters of closed and open states derived previously, the authors also seek to compare lipid interactions between putative functional states. The ultimate goal of this work is to understand how lipids shape the structure and function of this channel.

      The strengths of this article include the following:

      1) The MD simulation data provide extensive sampling of lipid interactions in GLIC, and these interactions were characterized in putative closed and open states of the channel. The extensive sampling permits confident delineation of 5-6 phospholipid interaction sites per subunit. The agreement in phospholipid binding poses between structures and the all-atom MD simulations supports the utility of MD simulations to examine lipid interactions.

      2) The study presents phospholipid binding sites/poses that agree with functionally-important lipid binding sites in other pLGICs, supporting the notion that these sites are conserved. For example, the authors identify interactions of POPC at an outer leaflet intersubunit site that is specific for the open state. This result is quite interesting as phospholipids or drugs that positively modulate other pLGICs are known to occupy this site. Also, the effect of mutating W217 in the inner leaflet intersubunit site suggests that this residue, which is highly conserved in pLGICs, is an important determinant of the strength of phospholipid interactions at this site. This residue has been shown to interact with phospholipids in other pLGICs and forms the binding site of potentiating neurosteroids in the GABA(A) receptor.

      Weaknesses of this article include the following:

      1) The authors describe in detail state-dependent lipid interactions from the MD simulations; however, the functional significance of these findings is unclear. GLIC function appears to be insensitive to lipids, although this understanding is based on experiments where GLIC proteoliposomes were fused to oocyte membranes, which may not be optimal to control the lipid environment. Without functional studies of GLIC in model membranes, the lipid dependence of GLIC function is not definitively known. Therefore, it is difficult to interpret the meaning of these state-dependent lipid interactions in GLIC.

      2) It is unlikely that the bound phospholipids in the GLIC structures, which are co-purified from e. coli membranes, are POPC. Rather, these are most like PE or PG lipids. While it is difficult to accommodate mixed phospholipid membranes in all-atom MD simulations, the choice of POPC for this model, while practically convenient, seems suboptimal, especially since it is not known if PE or PG lipids modulate GLIC function. Nevertheless, it is striking that the overall binding poses of POPC from the simulations agree with those identified in the structures. It is possible that the identity of the phospholipid headgroup will have more of an impact on the strength of interactions with GLIC rather than the interaction poses (see next point).

      3) The all-atom MD simulations provide limited insight into the strength of the POPC interactions at each site, which is important to interpret the significance of these interactions. It is unlikely that the system has equilibrated within the 1.7 microseconds of simulation for each replicate preventing a meaningful assessment of the lipid interaction times. Although the authors report exchange of up to 4 POPC interacting at certain residues in M4, this may not represent binding/unbinding events (depending on how binding/interaction is defined), since the 4 Å cutoff distance for lipid interactions is relatively small. This may instead be a result of small movements of POPC in and out of this cutoff. The ability to assess interaction times may have been strengthened if the authors performed a single extended replicate up to, for example, 10-20 microseconds instead of extending multiple replicates to 1.7 microseconds.

      Reviewer #2 (Public Review):

      The authors convincingly show multiple inner and outer leaflet non-protein (lipid) densities in a cryo-EM closed state structure of GLIC, a prokaryotic homologue of canonical pentameric ligand-gated ion channels, and observe lipids in similar sites during extensive simulations at both resting and activating pH. The simulations not only corroborate structural observations, but also suggest the existence of a state-dependent lipid intersubunit site only occupied in the open state. These important findings will be of considerable interest to the ion channel community and provide new hypotheses about lipid interactions in conjunction with channel gating.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      In particular, a discussion of whether the timescale of the simulations permit measurements of residence or interaction times of the lipids should be addressed.

      Reviewer #1 (Recommendations for the authors):

      Comment 1.1: The authors may consider expanding the discussion about the significance of state-dependent lipid interactions. On the one hand, they emphasize state-dependent interactions of POPC with closed and open states in the outer leaflet in the results. On the other hand, they state that GLIC is insensitive to its lipid environment. What is the significance of the state-dependent interactions of POPC in GLIC, if any? It is possible that GLIC agonist responses are sensitive to phospholipids (such as PE or PG found in e. coli)? The state-dependent differences in lipid interaction identified in this study support this possibility and suggest the need to better understand the effects of phospholipids on GLIC function.

      Response 1.1: We agree with the reviewer that this is an interesting question and we have therefore extended the discussion with additional references on the functional effects on GLIC of various lipid membranes:

      p. 11 (Discussion)

      “Sampling was further simplified by performing simulations in a uniform POPC membrane. Prior experiments have been conducted to assess the sensitivity of GLIC in varying lipid environments (Labriola et al., 2013; Carswell et al., 2015; Menny et al., 2017), indicating that GLIC remains fully functional in pure POPC bilayers. In our cryo-EM experiments, the protein was recombinantly expressed from E. coli, which means that the experimental density would likely represent phosphatidylglycerol or phosphatidylethanolamine lipids. However, as the molecular identities of bound lipids could not be precisely determined, POPC lipids were built for straightforward comparison with simulation poses. While it appears that GLIC is capable of gating in a pure POPC bilayer, it remains plausible that its function could be influenced by different lipid species, especially due to the presence of multiple charged residues around the TMD/ECD interface which might interact differently with different lipid head groups. Further experiments would be needed to confirm whether the state dependence observed in simulations is also lipid-dependent. It is possible that certain types of lipids bind in one but not the other state, or that certain states are stabilized by a particular lipid type.”

      Comment 1.2: It would be helpful to state in the discussion that the co-purified lipids from GLIC structures are likely PE or PG from e. coli membranes. Nevertheless, it is interesting that the phospholipid poses from the structures generally agree with those identified from the MD simulations using PC.

      Response 1.2: Good point. We have clarified in the discussion that the native lipids in the cryo-EM structure are likely PG or PE lipids, as quoted in the preceding Response.

      Comment 1.3: The authors describe a more deeply penetrating interaction of POPC in the outer intrasubunit cleft in the open state, but this is difficult to appreciate from the images in Fig. 4B, 4E or S3B. The same is true of the deep POPC interaction at the outer intersubunit site. It may be helpful to show these densities from a different perspective to appreciate the depth of these binding poses.

      Response 1.3: We have added Figure 4 – figure supplement 1 to better show the depth of lipid binding poses, especially the ones in the outer leaflet intrasubunit cleft and at the inner intersubunit site, and cited the figure on p. 7 (Results).

      Comment 1.4: The representation of the lipid densities in Fig. 4B is not easy to interpret. First, the meaning of resting versus activating conditions and closed versus open states can be easily missed for readers who are not familiar with the author's previous study. It may be helpful to describe this (i.e. how open and closed state clusters were generated from structures determined in resting and activating conditions) in greater detail in either the figure legend, results or methods. Second, the authors state that there are differences in lipid poses between the closed and open states but not resting and activating conditions. With the exception of the intersubunit density, this is difficult to appreciate from Fig. 4B. As stated in point #3, the difference, for example, in the complementary intrasubunit site may be better appreciated with an image from a different perspective.

      Response 1.4: Acknowledged - the distinction between resting and activating conditions v.s. open and closed states can be confusing. We have tried to clarify these differences at the beginning of the results section, the methods section, and in the caption of Figure 4. Regarding differences in lipid poses between open and closed states, we agree it is difficult to appreciate from Figure 4, but here we refer the reader to Figure 4 – figure supplement 2 for an overlay between open and closed densities. Additionally, we now added Figure 1 – figure supplement 1 which provides lipid densities for all five subunits and overlays with the build cryo-EM lipids, possibly making differences easier to appreciate. Regarding images from different perspectives, we trust the new figure supplement described in Response 1.3 provides a better perspective.

      p. 3 (Results)

      “For computational quantification of lipid interactions and binding sites, we used molecular simulations of GLIC conducted under either resting or activating conditions (Bergh et al., 2021a). As described in Methods, resting conditions corresponded to neutral pH with most acidic residues deprotonated; activating conditions corresponded to acidic pH with several acidic residues protonated. Both open and closed conformations were present in both conditions, albeit with different probabilities.”

      p. 8 (Figure 4)

      “Overlaid densities for each state represent simulations conducted under resting (dark shades) or activating (light shades) conditions, which were largely superimposable within each state.”

      p. 24 (Methods)

      “We analyzed previously published MSMs of GLIC gating under both resting and activating conditions (Bergh et al., 2021a). Resting conditions corresponded to pH 7, at which GLIC is nonconductive in functional experiments, with all acidic residues modeled as deprotonated. Activating conditions corresponded to pH 4.6, at which GLIC is conductive and has been crystallized in an open state (Bocquet et al., 2009). These conditions were modeled by protonating a group of acidic residues (E26, E35, E67, E75, E82, D86, D88, E177, E243; H277 doubly protonated) as previously described (Nury et al., 2011).”

      Comment 1.5: The new closed GLIC structure was obtained by merging multiple datasets. What were the conditions of the datasets used? Was it taken from samples in resting or also activating conditions?

      Response 1.5: We have updated the Results, Discussion, and Methods to clarify this important point, in particular by merging datasets and rerunning the classification:

      p. 3 (Results)

      “In our cryo-EM work, a new GLIC reconstruction was generated by merging previously reported datasets collected at pH 7, 5, and 3 (Rovšnik et al., 2021). The predominant class from the merged data corresponded to an apparently closed channel at an overall resolution of 2.9 Å, the highest resolution yet reported for GLIC in this state (Figure 1 – figure supplement 2, Table 1).”

      p. 11 (Discussion)

      “Interestingly, the occupational densities varied remarkably little between resting and activating conditions (Figure 1 – figure supplement 1), indicating state- rather than pH- dependence in lipid interactions, also further justifying the approach of merging closed- state GLIC cryo-EM datasets collected at different pH conditions to resolve lipids.”

      p. 14 (Methods)

      “After overnight thrombin digestion, GLIC was isolated from its fusion partner by size exclusion in buffer B at pH 7, or in buffer B with citrate at pH 5 or 3 substituted for Tris. The purified protein was concentrated to 3–5 mg/mL by centrifugation. [...] Data from three different grids, at pH 7, 5, and 3, were merged and processed together.”

      Comment 1.6: In Fig. 3D, do the spheres represent the double bond? If so, please state in the legend

      Response 1.6: We have clarified in the legend of Figure 3D that the yellow spheres on the lipid tails represent a double bond.

      Comment 1.7: In Fig. 3E, what is the scale of the color representation?

      Response 1.7: We have clarified in the legend of Figure 3E that colors span 0 (white) to 137015 contacts (dark red).

      Reviewer #2 (Recommendations For The Authors):

      Comment 2.1: I'm not sure I fully understand how the final lipids were modeled (built). Fig. 1 caption suggests they may have been manually built? I understand that the idea was to place them in the overlap of simulation densities and structure densities, but can the authors please clarify if there were any quantifiable conditions that were employed during this process or if this was entirely manual placement in a pose that looked good? Regardless, it would be helpful to see an overlay of the built lipids with both the cryo and simulation densities (e.g., overly of Fig. 1F/H and G/H) to better visualize how the final built lipids compare.

      Response 2.1: We thank the reviewer for pointing out unclarities regarding our methods. We have extended the methods section to clarify how the lipids were manually built in the cryo-EM structure. We have also added Figure 1 – figure supplement 1 showing overlays of the computational densities and built cryo-EM lipids.

      p. 15 (Methods)

      “Lipids were manually built in COOT by importing a canonical SMILES format of POPC (Kim et al., 2021) and adjusting it individually into the cryo-EM density in each of the sites associated with a single subunit, based in part on visual inspection of lipid densities from simulations, as described above. After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Comment 2.2: Regarding the state-dependent lipid entry to the outer leaflet intersubunit site associated with channel opening, if the authors could include a movie depicting this process that would be great. The current short explanation does not do this justice. Also, what were the dynamics of this process? Beyond the correlation between site occupancy and the pore being open, how did the timing of lipid entry/exit and pore opening/closing correlate?

      Response 2.2: The point regarding the timing of state-dependent lipid binding at the subunit interface and pore opening is indeed an interesting one. We have added Figure 4 – figure supplement 3D showing that the state-dependent P250 lipid interaction precedes pore opening, as quantified by pore hydration levels, indicating a potential role in gating. The interaction between lipid binding and conformational change of the protein is also depicted in the newly added Figure 4 - video supplement 1, which we hope will be able to better communicate the conclusions regarding state-dependent interactions. We have also expanded the results and discussion to better explain these results:

      p. 9 (Results)

      “The lipid head made particularly close contacts with residue P250 on the M2-M3 loop, which undergoes substantial conformational change away from the pore upon channel opening, along with outer-leaflet regions of M1–M3 (Figure 4E, Figure 4—figure Supplement 3A,B,C, Figure 4—video 1). These conformational changes were accompanied by a flip of M1 residue F195, which blocked the site in the closed state but rotated inward to allow closer lipid interactions in the open state (Figure 4—figure Supplement 3C, Figure 4—video 1). Indeed, P250 was predominantly located within 3 Å of the nearest lipid atom in open- but not closed-state frames (Figure 4F). Despite being restricted to the open state, interactions with P250 were among the longest duration in all simulations (Figure 2C) and as these binding events preceded pore opening, it is plausible to infer a role for this state-dependent lipid interaction in the gating process (Figure 4 – figure supplement 3D).”

      p. 12 (Discussion)

      “The state-dependent binding event at this site preceded pore opening in MSMs, where lipid binding coincided with crossing a smaller energy barrier between closed and intermediate states, followed by pore opening at the main energy barrier between intermediate and open states (Figure 4 – figure supplement 3D). Further, since the P250- lipid interaction was characterized by relatively long residence times (Figure 2), it is possible this lipid interaction has a role to play in GLIC gating.”

      Comment 2.3: Although the interaction times are helpful, I didn't get a great sense of how mobile the lipids are during the simulations. Can the authors discuss this a bit more. For example, are interaction times dominated by lipids that jiggle a bit away from a residue and then back again, vs how often are lipids exchanging with other lipids initially further away from the protein?

      Response 2.3: We have now added various measures of lipid diffusion, both for initially interacting lipids and for bulk lipids, which are summarized in the new Figure 2 – figure supplement 1. We have further addressed the question of simulation timescales in Results, Discussion, and Methods. These numbers highlight that it is possible for lipids several nanometers away from the protein surface to exchange with lipids of the first lipid shell.

      p. 3,6 (Results)

      “Lateral lipid diffusion coefficients were estimated to 1.47 nm2/µs for bulk lipids and 0.68 nm2/µs for lipids of the first lipid shell (Figure 2 – figure supplement 1A), which is relatively slow compared to the timescales of each trajectory (1.7 µs). However, multiple residues throughout the M1, M3, and M4 helices exchanged contacts with 2-4 different lipid molecules in individual simulations (Figure 2C). Furthermore, 1.7-µs root mean square displacement of lipids originally in the first lipid shell was 2.15 nm, and 3.16 nm in the bulk bilayer, indicating such exchanges are not limited to nearby lipids (Figure 2 – figure supplement 1B). Thus, exchange events and diffusion estimates indicate that the duration of lipid contacts observed in this work can be at least partly attributed to interaction stabilities and not solely to sampling limitations.”

      p. 11 (Discussion)

      “Indeed, the unrestrained atomistic MD simulations studied here were not expected to capture the maximal duration of stable contacts, as indicated by some interaction times approaching the full 1.7-µs trajectory (Figure 2}). Nevertheless, simulations were of sufficient length to sample exchange of up to four lipids, particularly around the M4 helix. Calculation of lipid lateral diffusion coefficients resulted in average displacements at the end of simulations of 2.15 nm for lipids initially interacting with the protein surface, roughly corresponding to lipids diffusing out to the 4th lipid shell. Diffusion of bulk lipids was faster, allowing lipids originally 3.16 nm away from the protein surface to ingress the first lipid shell. This observation underscores the potential for lipid exchange events even among lipids initially distant from the protein surface. Of course, duration of exceptionally stable interactions, such as those involving T274 (Figure 2C), inevitably remain bounded by the length of our simulations. Still, diffusion metrics, supported by robust statistical analysis encompassing diverse starting conditions (500 trajectories), enable confident estimation of relative interaction times.“

      p. 13 (Methods)

      “Time-based measures of protein-lipid interactions, such as mean duration times and exchange of interactions, were calculated for the 100 x 1.7 µs-long simulations using prolintpy (Sejdiu and Tieleman, 2021) with a 4 Å interaction cutoff. Analysis of lateral lipid diffusion in individual simulations was carried out for two disjoint sets of lipids: the first lipid shell defined as lipids with any part within 4 Å of the protein surface (~90 lipids), and bulk lipids consisting of all other lipids (~280 lipids). Mean square displacements of each lipid set were calculated using GROMACS 2021.5 (Abraham et al., 2015b) with contributions from the protein center of mass removed. Diffusion coefficients for each set, DA, were calculated using the Einstein relation (Equation 1) by estimating the slope of the linear curve fit to the data.

      where ri(t) is the coordinate of the center of mass of lipid i of set A at time t and DA is the self-diffusion coefficient.”

      Comment 2.4: How symmetric or asymmetric are the cryo and simulation densities across subunits and was there subunit asymmetry in the final build lipids? I could not tell from any of the figures beyond the casual observation that they maybe look somewhat similar in Fig. 1?

      Response 2.4: We thank the reviewer for this useful remark. We have clarified in the methods that the cryo-EM lipids were built in C5-symmetry, and thus the positions are symmetric. The computational densities were calculated independently for each subunit and are thus not necessarily symmetric. We have added Figure 1 – figure supplement 1 showing densities for all five subunits, also serving as an indication of convergence of the results.

      p. 3 (Results) “Although the stochastic nature of simulations resulted in nonidentical lipid densities associated with the five GLIC subunits, patterns of lipid association were notably symmetric (Figure 1 – figure supplement 1).”

      p. 14-15 (Methods)

      “A smaller subset of particles was used to generate an initial model. All subsequent processing steps were done using 5-fold symmetry. […] A monomer of that model was fit to the reconstructed density and 5-fold symmetry was applied with PHENIX 1.19.2-4158 through NCS restraints detected from the reconstructed cryo-EM map, to generate a complete channel. […] After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Minor comments:

      Comment 2.5: Fig. 1 is probably not easy to follow for the general reader and the caption is very brief. I suggest adding an additional explanation to the caption and/or additional annotations to the figure to help a general reader step through this.

      Response 2.5: We have expanded the caption of Figure 1 and clarified the meanings of colors, labels, and annotations.

      Comment 2.6: Fig. 1B - Caption is confusing. I would not call the state separation lines outlines as they are not closed loops. Also, I see red/orange and two shades of blue whereas the caption mentions orange and blue only. The caption should also explicitly say what the black lines are (other cluster separations).

      Response 2.6: We have edited the caption to better describe colors, annotations, and the meaning of the data:

      p. 4 (Figure 1)

      “(B) Markov state models were used to cluster simulations conducted under resting (R) or activating (A) conditions into five states, including closed (left of the light or dark orange lines) and open (right of the light or dark blue lines). Black lines mark edges of other state clusters derived from MSM eigenvectors. Experimental structures are highlighted as white circles.”

      Comment 2.7: Fig. 3F caption appears to conflict with data where interaction with W217A appears longer than W217. I think the authors want to suggest here that W217A reduces contact time with T274 as stated in the main text.

      Response 2.7: We have clarified in this legend that “Mutation of residue W217, lining this pocket, reveals shortened interactions at the T274 binding site” (p. 6, Figure 3).

      Comment 2.8: Ref 25 and 26 are the same.

      Response 2.8: Apologies; this mistake has been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides potentially important, new information about the combination of information from the two eyes in humans. The data included frequency tagging of each eye's inputs and measures reflecting both cortical (EEG) and sub-cortical processes (pupillometry). Binocular combination is of potentially general interest because it provides -in essence- a case study of how the brain combines information from different sources and through different circuits. The strength of supporting evidence appears to be solid, showing that temporal modulations are combined differently than spatial modulations, with additional differences between subcortical and cortical pathways. However, the manuscript's clarity could be improved, including by adding more convincing motivations for the approaches used.

      We thank the editor and reviewers for their detailed comments and suggestions regarding our paper. We have implemented most of the suggested changes. In doing so we noticed a minor error in our analysis code that affected the functions shown in Figure 2e (previously Figure 1e), and have fixed this and rerun the modelling. Our main results and conclusions are unaffected by this change. We have also added a replication data set to the Appendix, as this bears on one of the points raised by a reviewer, and included a co-author who helped run this experiment.

      Reviewer #1 (Public Review):

      In this paper, the interocular/binocular combination of temporal luminance modulations is studied. Binocular combination is of broad interest because it provides a remarkable case study of how the brain combines information from different sources. In addition, the mechanisms of binocular combination are of interest to vision scientists because they provide insight into when/where/how information from two eyes is combined.

      This study focuses on how luminance flicker is combined across two eyes, extending previous work that focused mainly on spatial modulations. The results appear to show that temporal modulations are combined in different ways, with additional differences between subcortical and cortical pathways.

      1. Main concern: subcortical and cortical pathways are assessed in quite different ways. On the one hand, this is a strength of the study (as it relies on unique ways of interrogating each pathway). However, this is also a problem when the results from two approaches are combined - leading to a sort of attribution problem: Are the differences due to actual differences between the cortical and subcortical binocular combinations, or are they perhaps differences due to different methods. For example, the results suggest that the subcortical binocular combination is nonlinear, but it is not clear where this nonlinearity occurs. If this occurs in the final phase that controls pupillary responses, it has quite different implications.

      At the very least, this work should clearly discuss the limitations of using different methods to assess subcortical and cortical pathways.

      The modelling asserts that the nonlinearity is primarily interocular suppression, and that this is stronger in the subcortical pathway. Moreover the suppression impacts before binocular combination. So this is quite a specific location. We now say more about this in the Discussion, and also suggest that fMRI might avoid the limits on the conclusions we can draw from different methods.

      1. Adding to the previous point, the paper needs to be a better job of justifying not only the specific methods but also other details of the study (e.g., why certain parameters were chosen). To illustrate, a semi-positive example: Only page 7 explains why 2Hz modulation was used, while the methods for 2Hz modulation are described in detail on page 3. No justifications are provided for most of the other experimental choices. The paper should be expanded to better explain this area of research to non-experts. A notable strength of this paper is that it should be of interest to those not working in this particular field, but this goal is not achieved if the paper is written for a specialist audience. In particular, the introduction should be expanded to better explain this area of research, the methods should include justifications for important empirical decisions, and the discussion should make the work more accessible again (in addition to addressing the issues raised in point 1 above). The results also need more context. For example, why EEG data have overtones but pupillometry does not?

      We now explain the choice of frequency in the final paragraph of the introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      We also mention why the pupil response is low-pass:

      ‘The pupil response can be modulated by periodic changes in luminance, and is temporally low-pass (Barrionuevo et al., 2014; Spitschan et al. 2014), most likely due to the mechanical limitations of the iris sphincter and dilator muscles’.

      Reviewer #2 (Public Review):

      Previous studies have extensively explored the rules by which patterned inputs from the two eyes are combined in the visual cortex. Here the authors explore these rules for un-patterned inputs (luminance flicker) at both the level of the cortex, using Steady-State Visual Evoked Potentials (SSVEPs) and at the sub-cortical level using pupillary responses. They find that the pattern of binocular combination differs between cortical and sub-cortical levels with the cortex showing less dichoptic masking and somewhat more binocular facilitation.

      Importantly, the present results with flicker differ markedly from those with gratings (Hou et al., 2020, J Neurosci, Baker and Wade 2017 cerebral cortex, Norcia et al, 2000 Nuroreport, Brown et al., 1999, IOVS). When SSVEP responses are measured under dichoptic conditions where each eye is driven with a unique temporal frequency, in the case of grating stimuli, the magnitude of the response in the fixed contrast eye decreases as a function of contrast in the variable contrast eye. Here the response increases by varying (small) magnitudes. The authors favor a view that cortex and perception pool binocular flicker inputs approximately linearly using cells that are largely monocular. The lack of a decrease below the monocular level when modulation strength increase is taken to indicate that previously observed normalization mechanism in pattern vision does not play a substantial role in the processing of flicker. The authors present a computational model of binocular combination that captures features of the data when fit separately to each data set. Because the model has no frequency dependence and is based on scalar quantities, it cannot make joint predictions for the multiple experimental conditions which is one of its limitations.

      A strength of the current work is the use of frequency-tagging of both pupil and EEG responses to measure responses for flicker stimuli at two anatomical levels of processing. Flicker responses are interesting but have been relatively neglected. The tagging approach allows one to access responses driven by each eye, even when the other eye is stimulated which is a great strength. The tagging approach can be applied at both levels of processing at the same time when stimulus frequencies are low, which is an advantage as they can be directly compared. The authors demonstrate the versatility of frequency tagging in a novel experimental design which may inspire other uses, both within the present context and others. A disadvantage of the tagging approach for studying sub-cortical dynamics via pupil responses is that it is restricted to low temporal frequencies given the temporal bandwidth of the pupil. The inclusion of a behavioral measure and a model is also a strength, but there are some limitations in the modeling (see below).

      The authors suggest in the discussion that luminance flicker may preferentially drive cortical mechanisms that are largely monocular and in the results that they are approximately linear in the dichoptic cross condition (no effect of the fixed contrast stimulus in the other eye). By contrast, prior research using dichoptic dual frequency flickering stimuli has found robust intermodulation (IM) components in the VEP response spectrum (Baitch and Levi, 1988, Vision Res; Stevens et al., 1994 J Ped Ophthal Strab; France and Ver Hoeve, 1994, J Ped Ophthal Strab; Suter et al., 1996 Vis Neurosci). The presence of IM is a direct signature of binocular interaction and suggests that at least under some measurement conditions, binocular luminance combination is "essentially" non-linear, where essential implies a point-like non-linearity such as squaring of excitatory inputs. The two views are in striking contrast. It would thus be useful for the authors could show spectra for the dichoptic, two-frequency conditions to see if non-linear binocular IM components are present.

      This is an excellent point, and one that we had not previously appreciated the importance of. We have generated a figure (Fig 8) showing the IM response in the cross frequency conditions. There is a clear response at 0.4Hz in the pupillometry data (2-1.6Hz), and at 3.6Hz in the EEG data (2+1.6Hz). We therefore agree that this shows the system is essentially nonlinear, despite the binocular combination appearing approximately linear. We now say in the Discussion:

      ‘In the steady-state literature, one hallmark of a nonlinear system is the presence of intermodulation responses at the sums and differences of fundamental flicker frequencies (Baitch & Levi, 1988; Tsai et al., 2012). In Figure 8 we plot the amplitude spectra of conditions from Experiment 1 in which the two eyes were stimulated at different frequencies (2Hz and 1.6Hz) but at the same contrast (48%; these correspond to the binocular cross and dichoptic cross conditions in Figures 2d,e and 3d,e). Consistent with the temporal properties of pupil responses and EEG, Figure 8a reveals a strong intermodulation difference response at 0.4Hz (red dashed line), and Figure 8b reveals an intermodulation sum response at 3.6Hz (red dashed line). The presence of these intermodulation terms is predicted by nonlinear gain control models of the type considered here (Baker and Wade, 2017; Tsai et al., 2012), and indicates that the processing of monocular flicker signals is not fully linear prior to the point at which they are combined across the eyes.’

      If the IM components are indeed absent, then there is a question of the generality of the conclusions, given that several previous studies have found them with dichoptic flicker. The previous studies differ from the authors' in terms of larger stimuli and in their use of higher temporal frequencies (e.g. 18/20 Hz, 17/21 Hz, 6/8 Hz). Either retinal area stimulated (periphery vs central field) or stimulus frequency (high vs low) could affect the results and thus the conclusions about the nature of dichoptic flicker processing in cortex. It would be interesting to sort this out as it may point the research in new directions.

      This is a great suggestion about retinal area. As chance would have it, we had already collected a replication data set where we stimulated the periphery, and we now include a summary of this data set as an Appendix. In general the results are similar, though we obtain a measurable (though still small) second harmonic response in the pupillometry data with this configuration, which is a further indication of nonlinear processing.

      Whether these components are present or absent is of interest in terms of the authors' computational model of binocular combination. It appears that the present model is based on scalar magnitudes, rather than vectors as in Baker and Wade (2017), so it would be silent on this point. The final summation of the separate eye inputs is linear in the model. In the first stage of the model, each eye's input is divided by a weighted input from the other eye. If we take this input as inhibitory, then IM would not emerge from this stage either.

      We have performed the modelling using scalar values here for simplicity and transparency, and to make the fitting process computationally feasible (it took several days even done this way). This type of model is quite capable of processing sine waves as inputs, and producing a complex output waveform which is Fourier transformed and then analysed in the same way as the experimental data (see e.g. Tsai, Wade & Norcia, 2012, J Neurosci; Baker & Wade, 2017, Cereb Cortex). However our primary aim here was to fit the model, and make inferences about the parameter values, rather than to use a specific set of parameter values to make predictions. We now say more about this family of models and how they can be applied in the methods section:

      “Models from this family can handle both scalar contrast values and continuous waveforms (Tsai et al., 2012) or images (Meese and Summers, 2007) as inputs. For time-varying inputs, the calculations are performed at each time point, and the output waveform can then be analysed using Fourier analysis in the same way as for empirical data.This means that the model can make predictions for the entire Fourier spectrum, including harmonic and intermodulation responses that arise as a consequence of nonlinearities in the model (Baker and Wade, 2017). However for computational tractability, we performed fitting here using scalar contrast values.”

      As a side point, there are quite a lot of ways to produce intermodulation terms, meaning they are not as diagnostic as one might suppose. We demonstrate this in Author response image 1, which shows the Fourier spectra produced by a toy model that multiplies its two inputs together (for an interactive python notebook that allows various nonlinearities to be explored, see here). Intermodulation terms also arise when two inputs of different frequencies are summed, followed by exponentiation. So it would be possible to have an entirely linear binocular summation process, followed by squaring, and have this generate IM terms (not that we think this is necessarily what is happening in our experiments).

      Author response image 1

      Related to the model: One of the more striking results is the substantial difference between the dichoptic and dichoptic-cross conditions. They differ in that the latter has two different frequencies in the two eyes while the former has the same frequency in each eye. As it stands, if fit jointly on the two conditions, the model would make the same prediction for the dichoptic and dichoptic-cross conditions. It would also make the same prediction whether the two eyes were in-phase temporally or in anti-phase temporally. There is no frequency/phase-dependence in the model to explain differences in these cases or to potentially explain different patterns at the different VEP response harmonics. The model also fits independently to each data set which weakens its generality. An interpretation outside of the model framework would thus be helpful for the specific case of differences between the dichoptic and dichoptic-cross conditions.

      As mentioned above, the limitations the reviewer highlights are features of the specific implementation, rather than the model architecture in general. Furthermore, although this particular implementation of the model does not have separate channels for different phases, these can be added (see e.g. Georgeson et al., 2016, Vis Res, for an example in the spatial domain). In future work we intend to explore the phase relationship of flicker, but do not have space to do this here.

      Prior work has defined several regimes of binocular summation in the VEP (Apkarian et al.,1981 EEG Journal). It would be useful for the authors to relate the use of their terms "facilitation" and "suppression" to these regimes and to justify/clarify differences in usage, when present. Experiment 1, Fig. 3 shows cases where the binocular response is more than twice the monocular response. Here the interpretation is clear: the responses are super-additive and would be classed as involving facilitation in the Apkarian et al framework. In the Apkarian et al framework, a ratio of 2 indicates independence/linearity. Ratios between 1 and 2 indicate sub-additivity and are diagnostic of the presence of binocular interaction but are noted by them to be difficult to interpret mechanistically. This should be discussed. A ratio of <1 indicates frank suppression which is not observed here with flicker.

      Operationally, we use facilitation to mean an increase in response relative to a monocular baseline, and suppression to mean a decrease in response. We now state this explicitly in the Introduction. Facilitation greater than a factor of 2 indicates some form of super-additive summation. In the context of the model, we also use the term suppression to indicate divisive suppression between channels, however this feature does not always result in empirical suppression (it depends on the condition, and the inhibitory weight). We think that interpretation of results such as these is greatly aided by the use of a computational modelling framework, which is why we take this approach here. The broad applicability of the model we use in the domain of spatial contrast lends it credibility for our stimuli here.

      Can the model explore the full range of binocular/monocular ratios in the Apkarian et al framework? I believe much of the data lies in the "partial summation" regime of Apkarian et al and that the model is mainly exploring this regime and is a way of quantifying varying degrees of partial summation.

      Yes, in principle the model can produce the full range of behaviours. When the weight of suppression is 1, binocular and monocular responses are equal. When the weight is zero, the model produces linear summation. When the weight is greater than 1, suppression occurs. It is also possible to produce super-additive summation effects, most straightforwardly by changing the model exponents. However this was not required for our data here, and so we kept these parameters fixed. We agree that the model is a good way to unify the results across disparate experimental paradigms, and that is our main intention with Figure 7i.

      Reviewer #3 (Public Review):

      This manuscript describes interesting experiments on how information from the two eyes is combined in cortical areas, sub-cortical areas, and perception. The experimental techniques are strong and the results are potentially quite interesting. But the manuscript is poorly written and tries to do too much in too little space. I had a lot of difficulty understanding the various experimental conditions, the complicated results, and the interpretations of those results. I think this is an interesting and useful project so I hope the authors will put in the time to revise the manuscript so that regular readers like myself can better understand what it all means.

      Now for my concerns and suggestions:

      The experimental conditions are novel and complicated, so readers will not readily grasp what the various conditions are and why they were chosen. For example, in one condition different flicker frequencies were presented to the two eyes (2Hz to one and 1.6Hz to the other) with the flicker amplitude fixed in the eye presented to the lower frequency and the flicker amplitude varied in the eye presented to the higher frequency. This is just one of several conditions that the reader has to understand in order to follow the experimental design. I have a few suggestions to make it easier to follow. First, create a figure showing graphically the various conditions. Second, come up with better names for the various conditions and use those names in clear labels in the data figures and in the appropriate captions. Third, combine the specific methods and results sections for each experiment so that one will have just gone through the relevant methods before moving forward into the results. The authors can keep a general methods section separate, but only for the methods that are general to the whole set of experiments.

      We have created a new figure (now Fig 1) that illustrates the conditions from Experiment 1, and is referenced throughout the paper. We have kept the names constant, as they are rooted in a substantial existing literature, and it will be confusing to readers familiar with that work if we diverge from these conventions. We did consider separating out the methods section, but feel it helps the flow of the results section to keep it as a single section.

      I wondered why the authors chose the temporal frequencies they did. Barrionuevo et al (2014) showed that the human pupil response is greatest at 1Hz and is nearly a log unit lower at 2Hz (i.e., the change in diameter is nearly a log unit lower; the change in area is nearly 2 log units lower). So why did the authors choose 2Hz for their primary frequency? And why did the authors choose 1.6Hz which is quite close to 2Hz for their off frequency? The rationale behind these important decisions should be made explicit.

      We now explain this in the Introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      It is a compromise frequency that is not optimal for either modality, but generates a measurable signal for both. The choice of 1.6 Hz was for similar reasons - for a 10-second trial it is four frequency bins away from the primary frequency, so can be unambiguously isolated in the spectrum.

      By the way, I wondered if we know what happens when you present the same flicker frequencies to the two eyes but in counter-phase. The average luminance seen binocularly would always be the same, so if the pupil system is linear, there should be no pupil response to this stimulus. An experiment like this has been done by Flitcroft et al (1992) on accommodation where the two eyes are presented stimuli moving oppositely in optical distance and indeed there was no accommodative response, which strongly suggests linearity.

      We have not tried this yet, but it’s on our to-do list for future work. The accommodation work is very interesting, and we now cite it in the manuscript as follows:

      ‘Work on the accommodative response indicates that binocular combination there is approximately linear (Flitcroft et al. 1992), and can even cancel when signals are in antiphase (we did not try this configuration here).’

      Figures 1 and 2 are important figures because they show the pupil and EEG results, respectively. But it's really hard to get your head around what's being shown in the lower row of each figure. The labeling for the conditions is one problem. You have to remember how "binocular" in panel c differs from "binocular cross" in panel d. And how "monocular" in panel d is different than "monocular 1.6Hz" in panel e. Additionally, the colors of the data symbols are not very distinct so it makes it hard to determine which one is which condition. These results are interesting. But they are difficult to digest.

      We hope that the new Figure 1 outlining the conditions has helped with interpretation here.

      The authors make a strong claim that they have found substantial differences in binocular interaction between cortical and sub-cortical circuits. But when I look at Figures 1 and 2, which are meant to convey this conclusion, I'm struck by how similar the results are. If the authors want to continue to make their claim, they need to spend more time making the case.

      Indeed, it is hard to make direct comparisons across figures - this is why Figure 4 plots the ratio of binocular to monocular conditions, and shows a clear divergence between the EEG and pupillometry results at high contrasts.

      Figure 5 is thankfully easy to understand and shows a very clear result. These perceptual results deviate dramatically from the essentially winner-take-all results for spatial sinewaves shown by Legge & Rubin (1981); whom they should cite by the way. Thus, very interestingly the binocular combination of temporal variation is quite different than the binocular combination of spatial variation. Can the pupil and EEG results also be plotted in the fashion of Figure 5? You'd pick a criterion pupil (or EEG) change and use it to make such plots.

      We now cite Legge & Rubin. We see what you mean about plotting the EEG and pupillometry results in the same coordinates as the matching data, but we don’t think this is especially informative as we would end up only with data points along the axes and diagonal of the plot, without the points at other angles. This is a consequence of how the experiments were conducted.

      My main suggestion is that the authors need to devote more space to explaining what they've done, what they've found, and how they interpret the data. I suggest therefore that they drop the computational model altogether so that they can concentrate on the experiments. The model could be presented in a future paper.

      We feel that the model is central to the understanding and interpretation of our results, and have retained it in the revised version of the paper.

      Reviewer #2 (Recommendations For The Authors):

      I found the terms for the stimulus conditions confusing. I think a simple schematic diagram of the conditions would help the reader.

      Now added (the new Fig 1).

      In reporting the binocular to monocular ratio, please clarify whether the monocular data was from one eye alone (and how that eye was chosen) or from both eyes and then averaged, or something else. It would be useful to plot the results from the dichoptic condition in this form, as well.

      These were averaged across both eyes. We now say in the Methods section:

      ‘We confirmed in additional analyses that the monocular consensual pupil response was complete, justifying our pooling of data across the eyes.’

      Also, clarify whether the term facilitation is used as above throughout (facilitation being > 2 times monocular response under binocular condition) or if a different criterion is being used. If we take facilitation to mean a ratio > 2, then facilitation depends on temporal frequency in Figure 4.

      We now explain our use of these terms in the final paragraph of the Introduction:

      ‘Relative to the response to a monocular signal, adding a signal in the other eye can either increase the response (facilitation) or reduce it (suppression).’

      The magnitude of explicit facilitation attained is interesting, but not without precedent. Ratios of binocular to mean monocular > 2, have been reported previously and values of summation depend strongly on the stimulus used (see for example Apkarian et al., EEG Journal, 1981, Nicol et al., Doc Ophthal, 2011).

      We now mention this in the Discussion as follows:

      ‘(however we note that facilitation as substantial as ours has been reported in previous EEG work by Apkarian et al. (1981))’

      In Experiment 3, the authors say that the psychophysical matching results are consistent with the approximately linear summation effects observed in the EEG data of Experiment 1. In describing Fig. 3, the claim is that the EEG is non-linear, e.g. super-additive - at least at high contrasts. Please reconcile these statements.

      We think that the ‘superadditive’ effects are close enough to linear that we don’t want to make too much of a big deal about them - this could be measurement error, for example. So we use terms such as near-linear, or approximately linear, when referring to them throughout.

      Reviewer #3 (Recommendations For The Authors):

      Let me make some more specific comments using a page/paragraph/line format to indicate where in the text they're relevant.

      1/2 (middle)/3 from end. "In addition" seems out of place here.

      Removed.

      1/3/4. By "intensities" do you mean "contrasts"?

      Fixed.

      1/3/last. "... eyes'...".

      Fixed.

      2/5/3. By "one binocular disc", you mean into "one perceptually fused disc".

      Rewritten as: ‘to help with their perceptual fusion, giving the appearance of a single binocular disc’

      3/1/1. "calibrated" seems like the wrong word here. I think you're just changing the vergence angle to enable fusion, right?

      Now rewritten as: ‘Before each experiment, participants adjusted the angle of the stereoscope mirrors to achieve binocular fusion’

      3/1/1. "adjusting the angles...". And didn't changing the mirror angles affect the shapes of the discs in the retinal images?

      Perhaps very slightly, but this is well within the tolerance of the visual system to compensate for in the fused image, especially for such high contrast edges.

      3/3/5. "fixed contrast" is confusing here because it's still a flickering stimulus if I follow the text here. Reword.

      Now ‘fixed temporal contrast’

      3/4/1. It would be clearer to say "pupil tracker" rather than "eye tracker" because you're not really doing eye tracking.

      True, but the device is a commercial eye tracker, so this is the appropriate term regardless of what we are using it for.

      3/5/6. I'm getting lost here. "varying contrast levels" applies to the dichoptic stimulus, right?

      Yes, now reworded as ‘In the other interval, a target disc was displayed, flickering at different contrast levels on each trial, but with a fixed interocular contrast ratio across the block.’

      3/5/7. Understanding the "ratio of flicker amplitudes" is key to understanding what's going on here. More explanation would be helpful.

      Addressed in the above point.

      4/3/near end. Provide some explanation about why the Fourier approach is more robust to noise.

      Added ‘(which can make the phase and amplitude of a fitted sine wave unstable)’

      Figure 1. In panel a, explain what the numbers on the ordinate mean. What's zero, for example? Which direction is dilation? Same question for panel b. It's interesting in panel c that the response in one eye to 2Hz increases when the other eye sees 1.6Hz. Would be good to point that out in the text.

      Good idea about panel (a) - we have changed the y-axis to ‘Relative amplitude’ for clarity, and now note in the figure caption that ‘Negative values indicate constriction relative to baseline, and positive values indicate dilation.’ Panel (b) is absolute amplitude, so is unsigned. Panel (c) only contains 2Hz conditions, but there is some dichoptic suppression across the two frequencies in panels (d,e) - we now cover this in the text and include statistics.

      6/2/1. Make clear in the text that Figure 1c shows contrast response functions for the pupil.

      Now noted in the caption.

      Figure 3. I'm lost here. I feel like I should be able to construct this figure from Figures 1 and 2, but don't know how. More explanation is needed at least in the caption.

      Done. The caption now reads:

      ‘Ratio of binocular to monocular response for three data types. These were calculated by dividing the binocular response by the monocular response at each contrast level, using the data underlying Figures 2c, 3c and 3f. Each value is the average ratio across N=30 participants, and error bars indicate bootstrapped standard errors.’

      9/1/1-2. I didn't find the evidence supporting this statement compelling.

      We now point the reader to Figure 4 as a reminder of the evidence for this difference.

      9/1/6-9. You said this. But this kind of problem can be fixed by moving the methods sections as I suggested above.

      As mentioned, we feel that the results section flows better with the current structure.

      Figure 4. Make clear that this is EEG data.

      Now added to caption.

      Figure 5 caption. Infinite exponent in what equation?

      Now clarified as: ‘models involving linear combination (dotted) or a winner-take-all rule (dashed)’

      Figure 6. I hope this gets dropped. No one will understand how the model predictions were derived. And those who look at the data and model predictions will surely note (as the authors do) that they are rather different from one another.

      As noted above, we feel that the model is central to the paper and have retained this figure. We have also worked out how to correct the noise parameter in the model for the number of participants included in the coherent averaging, which fixes the discrepancy at low contrasts. The correspondence between the data and model in is now very good, and we have plotted the data points and curves in the same panels, which makes the figure less busy.

      12/1. Make clear in this paragraph that "visual cortex" is referring to EEG and perception results and that "subcortical" is referring to pupil. Explain clearly what "linear" would be and what the evidence for "non-linear" is.

      Good suggestion, we have added qualifiers linking to both methods. Also tidied up the language to make it clearer that we are talking about binocular combination specifically in terms of linearity, and spelled out the evidence for each point.

      12/2/6-9. Explain the Quaia et al results enough for the reader to know what reflexive eye movements were studied and how.

      We now specify that these eye movements are also known as the ‘ocular following response’ and were measured using scleral search coils.

      12/2/9-10. Same for Spitchan and Cajochen: more explanation.

      Added:

      “(melatonin is a hormone released by the pineal gland that regulates sleep; its production is suppressed by light exposure and can be measured from saliva assays)”

      12/3/2-3. Intriguing statements about optimally combining noisy signals, but explain this more. It won't be obvious to most readers.

      We have added some more explanation to this section.

      13/1. This is an interesting paragraph where the authors have a chance to discuss what would be most advantageous to the organism. They make the standard argument for perception, but basically punt on having an argument for the pupil.

      Indeed, we agree that this point is necessarily speculative, however we think it is interesting for the reader to consider.

      13/2/1. "Pupil size affects the ..." is more accurate.

      Fixed.

      13/2/2 from end. Which "two pathways"? Be clear.

      Changed to ‘the pupil and perceptual pathways’

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      In this manuscript (eLife-RP-RA-2024-103904), the authors identified that NOLC1 was upregulated in gastric cancer samples, which promoted cancer progression and cisplatin resistance. They further found that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis. There are several major concerns regarding the conclusions.

      Strengths:

      This study identified that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis in gastric cancer.

      Weaknesses:

      The major conclusions were not sufficiently supported by the results. The experiments were not conducted in a comprehensive manner.

      Major concerns

      (1) The authors investigated NOLC1 expression in gastric cancer (GC) using clinical samples, which is valuable; however, the sample array includes only 3 patients. This sample size is insufficient to support conclusions for human samples. Please increase the sample size and apply a more robust statistical analysis. Additionally, specify the statistical methods used in the figure legend.

      Thanks very much for the kind comments and great suggestions. As suggested, we have increased the sample size of GC patients, and the new data (six pair samples) was shown in Fig. S1A, further reflecting that NOLC1 was upregulate in gastric cancer (GC). Moreover, the statistical methods have been added in each figure legend.

      (2) These data are not sufficient to support the key conclusion of this study "NOLC1 is significantly upregulated in GC tissues and Cis-resistant GC cells". There is no convincing data showing that NOLC1 upregulation is specific to cancer cells or any other cell types. Based on the following results that NOLC1 expressed in cancer cells can support cancer cell survival and drug resistance, the authors switched to investigating the role of NOLC1 in cancer cells without demonstrating cancer cells indeed highly upregulate NOLC1.

      Thanks for raising this good question. As shown in Fig. 1E-F, the TCGA database have shown that NOLC1 was upregulated in GC. Moreover, we further analyzed the NOLC1 expression level in other cancer type, according to the Human Protein Atlas (https://www.proteinatlas.org/). The results indicated that NOLC1 mRNA level was much higher in almost all cancers except acute myeloid leukemia (LAML). In addition, according to the gene expression profiling interactive analysis (GEPIA, http://gepia.cancer-pku.cn/index.html), NOLC1 mRNA level was above 100 nTPM in most gastric cancer cell lines, however in most non-cancerous cell lines was below 100 nTPM, indicating that NOLC1 was up-regulated in gastric cancer.

      Author response image 1.

      The mRNA level of NOLC1 in different GC cells and non-cancerous cells.

      (3) The authors primarily use MGC-803 cells for experiments; however, MGC-803 is known to be a HeLa-contaminated cell line. Could the authors explain this choice of using this cell line only? Did they validate key findings with additional cell lines? This is particularly important for assays such as cisplatin resistance validation, in vivo experiments, TEM imaging, and MitoPeDPP fluorescence imaging.

      Thanks for raising this good question. We are not only use MGC-803 cells, the key findings in vitro was also validated in MKN-45 cells (Fig. 2), and in vivo experiment also validated in Mouse Forestomach Carcinoma cells (MFC)-tumor bearing 615 mice model (Fig 7). Furthermore, we further added some experiments in MKN-45 cells. The TEM imaging showed that NOLC1 could significantly inhibit cisplatin (Cis) induced lipid membrane damage in MKN-45 cells (Fig. S6A). Moreover, MitoPeDPP fluorescence assay analyzed by FCAs also indicating that rapid ROS was enriched in mitochondria in MKN-45 cells (Fig. 4E, Fig. S6J).

      (4) In Figure 2, did the authors perform assays with NOLC1 overexpression? If so, please include these results to strengthen the conclusions.

      Thanks very much for the kind comments and great suggestions. As suggested, we added new data about NOLC1 overexpression assay Cell counting kit-8 assay shows that NOLC1-overexpression group is more resistance to Cis compared to vector group (Fig. S4E, S5A).

      (5) The authors show in Figures 2A-B that shNOLC1 without cisplatin treatment does not affect cell viability. However, Figures 2D-E suggest increased apoptosis in shNOLC1 cells without cisplatin treatment. Additionally, in vivo studies in Figure 3 show no significant difference between the shNC+PBS and shNOLC1+PBS groups, which appears contradictory to the apoptosis assays. Similarly, Ki67 staining shows decreased scores in the shNOLC1 group compared to shNC. Could the authors clarify this inconsistency?

      Thanks for raising this good question. In Fig 2D-E, the difference in proportion of death cells between shNOLC1 and shNC treated with PBS groups were only 3% (MGC-803) and 7% (MKN-45) which is much lower than that treated with cisplatin in vitro. Moreover, in vivo analysis indicated that the average tumor volume in NOLC1+PBS group was smaller than that in NC group, but there was no statistical significance (p value = 0.3962). Moreover, tumor proliferation is a complex process regulated by many factors [1,2], thus the level of Ki67 is by no means the same as the rate of tumor proliferation, might be positively correlated.

      (6) In Figure 4, NOLC1 knockdown appears to enhance cisplatin-induced ferroptosis rather than apoptosis. Given p53's role in apoptosis, did the authors compare the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis? If so, please clarify whether NOLC1 predominantly regulates apoptosis or ferroptosis.

      Thanks for raising this good question. We do have compared the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis. As shown in Fig. 5A, NOLC1 knockdown obviously increased the BCL-2 protein level which is an anti-apoptotic protein and mediated by p53 via protein interaction in cytoplasm[3,4], this phenomenon may cause by the increasing level of p53 in cytoplasm (Fig. 6I). Also, the TEM imaging showed the classic ferroptotic morphological changes rather than apoptosis (Fig. 5A, S6A). Taken together, NOLC1 mainly regulates p53 mediated ferroptosis rather than apoptosis.

      (7) Did the authors perform co-IP assays with p53 or HA antibodies to immunocapture NOLC1? If not, please add this experiment to support protein interactions. The mechanistic correlation between p53 and NOLC1 can be supported by adding experiments using multiple GC cell lines with various p53 alterations (such as loss-of- function or gain-of-function mutations/deletions). This is critical because the authors specifically claimed that NOLC1 can inhibit p53-mediated ferroptosis, but not other tumor suppressors.

      Thanks very much for the kind comments and great suggestions. As suggested, we had performed Co-IP assay with anti-HA antibodies to immunocapture NOLC1-FLAG. As shown in Fig. 5K, p53 DNA binding domain (DBD)-HA could immunocapture with NOLC1, further indicated that NOLC1 could binding to p53 DBD. Moreover, we concur with the reviewer that adding experiments using multiple p53 alterations, however considering that different p53 mutants have completely different functional changes. Therefore, we using siRNA to knockdown p53 level in MGC-803 cells, the results shown that NOLC1 mediated resistance was disappear and the GPX4 level was increased (Fig. S10). These data have shown that NOLC1 promotes GC resistance via mediated p53 functions.

      (8) In Figure S5B, the LDH release can be blocked by Fer-1?

      Thanks for raising this good question. As suggested, Fer-1 (20 μmol/mL) significantly blocked the LDH release in NOLC1 knockdown group (Fig S6E). This data further confirmed that NOLC1 suppressed Cis-induced ferroptosis.

      (9) How about the ubiquitination assay in MGC-803 cells?

      Thanks for raising this good question. As suggested, we also analyzed the ubiquitination assay in MGC-803 cells. As the result showed that NOLC1 also could increasing level of ubiquitination of p53 (Fig. 6H).

      (10) In Figure 6H, the DBD domain of NOLC1 is required for inhibiting P53 ubiquitination.

      Thanks for your opinion. However, in our paper, we only mentioned that p53 DBD domain, rather than NOLC1 DBD domain. Also, we did not find any DNA binding function of NOLC1 in the Pubmed database. Therefore, we would like to ask whether the revised opinion is correct.

      (11) In Figure 8B, the CD3 antibody is not specific, please change it to a new one.

      Thanks very much for the kind comments and great suggestions. As suggested, we have used new CD3 antibody and the new data was added in Fig. 8B.

      (12) The authors report that NOLC1 influences peripheral blood lymphocytes with cisplatin treatment, with or without PD-1. Could the authors explain why NOLC1 would affect peripheral blood lymphocytes? Additionally, did they assess immune cell infiltration in the tumor microenvironment (TME) by flow cytometry?

      Thanks for raising good question. The tumor size of the knockdown group treated with Cis + PD-1 was too small (less than 100 mg) to extract enough infiltrated immune cells (less than 10000 CD45<sup>+</sup> cells), thus we chose to detect immune cells in the blood of the mice. Considering that the infiltrating immune cells including CTLs were originate from peripheral blood by circulation. Under the normal conditions, serval tumor biology behavior impact the TME to limit immune responses and present barriers to cancer therapy. For example, tumor could express or secret lots of negative regulator like PD-L1. Causing immune cells cannot recognize tumor cells and infiltrate into tumor tissue. Ferroptosis, as a new from of ICD, could damage tumor cell plasm and release amount of tumor associated antigen and tumor-specific antigens causing immune cells priming and activation. Eventually, the activated immune cells in peripheral blood travel towards the tumor site, infiltrating the tumor tissue under favorable co-stimulatory conditions and guided by chemokine gradients. Once within the tumor microenvironment, these activated T cells can control tumor growth through direct tumor cell destruction and cytokine-mediated processes [5–8]

      To assess immune cell infiltration in the TME, we analyzed the tumor infiltrated CD3<sup>+</sup> and CD8<sup>+</sup> immune cells in tumor tissue by immunofluorescence (Fig. 8B). Thus, the peripheral blood lymphocytes could reflect the infiltration of immune cells in the tumor.

      Minor concerns:

      (1) Please clarify the statistical methods in each figure legend.

      Thanks for your opinion. We have added statistical methods in each figure legend.

      (2) In Figure 2D, please provide statistical data of cleaved-caspase3 expression.

      Thanks for your opinion. As is shown in Fig. S5B-C, the relative cleaved-caspase3 were provided.

      (3) Please ensure that the canonical expressions used in the research paper are adhered to.

      Thanks for your opinion. We have carefully modified our expressions in our paper.

      (4) Please pay more attention to the grammar and formatting of texts.

      Thanks for your opinion. We revised our manuscript through the American Journal Experts (AJE) service.

      Reviewer #2:

      Summary:

      Shengsheng Zhao et al. investigated the role of nucleolar and coiled-body phosphoprotein 1 (NOLC1) in relegating gastric cancer (GC) development and cisplatin-induced drug resistance in GC. They found a significant correlation between high NOLC1 expression and the poor prognosis of GC. Meanwhile, upregulation of NOLC1 was associated with cis-resistant GC. Experimentally, the authors demonstrate that knocking down NOLC1 increased GC sensitivity to Cis possibly by regulating ferroptosis. Mechanistically, they found NOLC1 suppressed ferroptosis by blocking the translocation of p53 from the cytoplasm to the nucleus and promoting its degradation. In addition, The authors also evaluated the effect of combinational treatment of anti- PD-1 and cisplatin in NOLC1-knockdown tumor cells, revealing a potential role of NOLC1 in the targeted therapy for GC.

      Strengths:

      Chemoresistance is considered a major reason causing failure of tumor treatment and death of cancer patients. This paper explored the role of NOLC1 in the regulation of Cis-mediated resistance, which involves a regulated cell death named ferroptosis. These findings provide more evidence highlighting the study of regulated cell death to overcome drug resistance in cancer treatment, which could give us more potential strategies or targets for combating cancer.

      Weaknesses:

      More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored. Besides, the experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Major points:

      (1) More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored.

      Thanks very much for the kind comments and great suggestions. As suggested, we have further analyzed the ferroptosis inhibit ability of NOLC1 in MGC-45 cells treated with Erastin, a common used ferroptosis activator. As shown in Fig. S6B, the ferroptosis activated by Erastin was also blocked by NOLC1.

      (2) In Figure 1J, the CR cell line should obviously have less apoptosis-maker c-PARP expression, which means these cells are resistant to apoptosis induced by CR. Thus, it would be more rational to study the role of apoptosis regulation by NOLC1. Why did the later data shift to the study of ferroptosis?

      Thanks for raising this good question. In the CR cells, the expression levels of many genes were changed, so it is uncertain whether the decreased expression level of cleaved-PARP in the resistant cells is caused by NOLC1 up-regulated. To explore the specific mechanism of NOLC1 mediated resistant, we performed the TEM imaging (Fig. 4A, S6A) and the results showed that cells exhibited classic ferroptosis morphological changes. Moreover, the BCL-2 (an anti-apoptotic protein, and regulated by p53 via protein interaction in cytoplasm) was increased after NOLC1 knockdown (Fig S5A). This phenomenon may cause by the increasing p53 levels in the cytoplasm[3,4] (Fig 5I). Taken together we shift to study of cisplatin induced ferroptosis.

      (3) Besides, how about the regulation of apoptosis during cis-resistance by NOLC1 in GC?

      Thanks for raising this good question. As mentioned above the Cis induced apoptosis was not as significant as ferroptosis, caused by BCL-2 (a key anti-apoptosis protein) increasing which is mediated by p53 via protein interaction in cytoplasm. NOLC1 increased plasm p53 level subsequently increased BCL-2 level.

      (4) The experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Thanks for raising this good question. As is shown in Fig S10, after knockdown p53 protein level by using siRNA, NOLC1 could not promote Cis-resistance and the GPX4 level was increased reflecting that NOLC1 promotes Cis resistance via mediate p53 function.

      (5) In Figure 2, the data indicated that the knockdown of NOLC1 increased rH2Ax in the presence of Cisplatin, which indicated that NOLC1 might regulate DNA damage-related cellular function. These functions should be more relevant to cisplatin resistance, considering the fundamental effect of this chemo drug.

      Thanks very much for the kind comments and great suggestions. Indeed, we found that DNA damage was more obvious in knockdown groups, but the ferroptotic changes like ROS and mitochondrial membrane damage were also significantly different in knockdown groups. Considering that as a chemo drug, cisplatin not only induces damage DNA but also acts as a stress which could activates various signal pathways including apoptosis, ferroptosis, pyroptosis, necroptosis, etc., under different drug concentrate or time [9–11]. Therefore, it is important to find out the NOLC1 predominantly blocked pathway in GC.

      (6) In Figure.4, ferroptosis inhibitors like Ferr-1 or DFO should be used to verify the regulation of ferroptosis by Cisplatin and NOLC1.

      Thanks very much for the kind comments and great suggestions. As suggested, we performed additional LDH release assay. The results showed that Fer-1 also could block cisplatin induced LDH release in NOLC1 knockdown groups (Fig. S6E).

      (7) In Figure 4H, Cisplatin decreased FSP1 and GPX4, which could be enhanced in the NOLC1-konckdown cell line. Meanwhile, the knockdown of NOLC1 increased the ACSL4 level. These findings could be the key reason for the regulation of ferroptosis by NOLC1 rather than p53 since they all are direct regulators of ferroptosis.

      Thanks very much for the kind comments and great suggestions. We rewrote the text as you suggested. Recently, it also has been reported that ACSL4-regulated ferroptosis is related to p53, but the exact mechanism is still unclear [12]. Moreover, further studies of specific relation between NOLC1 and FSP1/ACSL4 will be conducted in the further

      (8) Whether p53 mediates the regulation of ferroptosis and cisplatin resistance by NOLC1 should be thoroughly studied using p53-KO cell lines.

      Thanks very much for the kind comments and great suggestions. As previously mentioned, by using si-RNA to knockdown p53, the NOLC1 mediate Cis-resistance were blocked (Fig. S10). Meanwhile, the GPX4 level was also increased in p53/NOLC1 double-knockdown groups compared to the NOLC1 knockdown group. These data indicating that NOLC1 suppresses ferroptosis via mediating p53 functions.

      Reviewer #3:

      The authors have put forth a compelling argument that NOLC1 is indispensable for gastric cancer resistance in both in vivo and in vitro models. They have further elucidated that NOLC1 silencing augments cisplatin-induced ferroptosis in gastric cancer cells. The mechanistic underpinning of their findings suggests that NOLC1 modulates the p53 nuclear/plasma ratio by engaging with the p53 DNA Binding Domain, which in turn impedes p53-mediated transcriptional regulation of ferroptosis. Additionally, the authors have shown that NOLC1 knockdown triggers the release of ferroptosis-induced damage-associated molecular patterns (DAMPs), which activate the tumor microenvironment (TME) and enhance the efficacy of the anti-PD-1 and cisplatin combination therapy.

      Strengths:

      The manuscript presents a robust dataset that substantiates the authors' conclusion. They have identified NOLC1 as a potential oncogene that confers resistance to immuno-chemotherapy in gastric cancer through the mediation of ferroptosis and subsequent TME reprogramming. This discovery positions NOLC1 as a promising therapeutic target for gastric cancer treatment. The authors have delineated a novel mechanistic pathway whereby NOLC1 suppresses p53 transcriptional functions by reducing its nuclear/plasma ratio, underscoring the significance of p53 nuclear levels in tumor suppression over total protein levels.

      Weaknesses:

      While the overall findings are commendable, there are specific areas that could benefit from further refinement. The authors have posited that NOLC1 suppresses p53- mediated ferroptosis; however, the mRNA levels of ferroptosis genes regulated by p53 have not been quantified, which is a critical gap in the current study. In Figure 4A, transmission electron microscopy (TEM) results are reported solely for the MGC-803 cell line. It would be beneficial to include TEM data for the MKN-45 cell line to strengthen the findings. The authors have proposed a link between NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance, yet the correlation between this ratio and patient prognosis remains unexplored, which is a significant limitation in the context of clinical relevance.

      Thanks very much for the kind comments and great suggestions. As suggested, recently studies have reported that CDKN1A (also called p21, a p53 transcriptional mediated protein) could promotes ferroptosis[13], the mRNA levels of ferroptosis genes regulated by p53 have were quantified in Fig. S8G-H. Moreover, we further proceed TEM imaging in MKN-45 cells, the result was consistent to MGC-803 cells, reflecting that NOLC1 has a broad spectrum of promoting drug resistance in gastric cancer. Also, recently studies have reported that p53 transcriptional active and p53 transcriptional inactive types include patients with intermediate prognosis and recurrence rates, with the p53-acvtie group showing better prognosis[14]. Considering p53 transcriptional activity depends on p53 nuclear accumulation, we assume that the low level of p53 nuclear/plasma may cause poor prognosis in gastric cancer. Meanwhile we will further collect enough samples and their prognostic information to analysis NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance.

      References

      (1) Z. Seferbekova, A. Lomakin, L.R. Yates, M. Gerstung, Spatial biology of cancer evolution, Nat Rev Genet 24 (2023) 295–313. https://doi.org/10.1038/s41576-022-00553-x.

      (2) T. Matsuoka, M. Yashiro, Molecular Mechanism for Malignant Progression of Gastric Cancer Within the Tumor Microenvironment, IJMS 25 (2024) 11735. https://doi.org/10.3390/ijms252111735.

      (3) Y. Liu, Z. Su, O. Tavana, W. Gu, Understanding the complexity of p53 in a new era of tumor suppression, Cancer Cell (2024) S1535610824001338. https://doi.org/10.1016/j.ccell.2024.04.009.

      (4) R. Pan, V. Ruvolo, H. Mu, J.D. Leverson, G. Nichols, J.C. Reed, M. Konopleva, M. Andreeff, Synthetic Lethality of Combined Bcl-2 Inhibition and p53 Activation in AML: Mechanisms and Superior Antileukemic Efficacy, Cancer Cell 32 (2017) 748-760.e6. https://doi.org/10.1016/j.ccell.2017.11.003.

      (5) E. Catanzaro, M. Beltrán-Visiedo, L. Galluzzi, D.V. Krysko, Immunogenicity of cell death and cancer immunotherapy with immune checkpoint inhibitors, Cell Mol Immunol 22 (2024) 24–39. https://doi.org/10.1038/s41423-024-01245-8.

      (6) G. Lei, L. Zhuang, B. Gan, The roles of ferroptosis in cancer: Tumor suppression, tumor microenvironment, and therapeutic interventions, Cancer Cell 42 (2024) 513–534. https://doi.org/10.1016/j.ccell.2024.03.011.

      (7) E. Catanzaro, R. Demuynck, F. Naessens, L. Galluzzi, D.V. Krysko, Immunogenicity of ferroptosis in cancer: a matter of context?, Trends in Cancer 10 (2024) 407–416. https://doi.org/10.1016/j.trecan.2024.01.013.

      (8) X. Jiang, B.R. Stockwell, M. Conrad, Ferroptosis: mechanisms, biology and role in disease, Nat Rev Mol Cell Biol 22 (2021) 266–282. https://doi.org/10.1038/s41580-020-00324-8.

      (9) J.-L. Roh, E.H. Kim, H. Jang, D. Shin, Nrf2 inhibition reverses the resistance of cisplatin-resistant head and neck cancer cells to artesunate-induced ferroptosis, Redox Biology 11 (2017) 254–262. https://doi.org/10.1016/j.redox.2016.12.010.

      (10) X. Wang, Y. Zhou, D. Wang, Y. Wang, Z. Zhou, X. Ma, X. Liu, Y. Dong, Cisplatin-induced ototoxicity: From signaling network to therapeutic targets, Biomedicine & Pharmacotherapy 157 (2023) 114045. https://doi.org/10.1016/j.biopha.2022.114045.

      (11) J. Liang, G. Bi, Y. Huang, G. Zhao, Q. Sui, H. Zhang, Y. Bian, J. Yin, Q. Wang, Z. Chen, C. Zhan, MAFF confers vulnerability to cisplatin-based and ionizing radiation treatments by modulating ferroptosis and cell cycle progression in lung adenocarcinoma, Drug Resistance Updates 73 (2024) 101057. https://doi.org/10.1016/j.drup.2024.101057.

      (12) M.Y. Kosim, T. Fukazawa, M. Miyauchi, N. Hirohashi, K. Tanimoto, p53 status modifies cytotoxic activity of lactoferrin under hypoxic conditions, Front. Pharmacol. 13 (2022) 988335. https://doi.org/10.3389/fphar.2022.988335.

      (13) Q. Gao, J. Chen, C. Li, J. Zhan, X. Yin, B. Li, H. Dong, L. Luo, Z. Li, CDKN1A promotes Cis-induced AKI by inducing cytoplasmic ROS production and ferroptosis, Food and Chemical Toxicology 193 (2024) 115003. https://doi.org/10.1016/j.fct.2024.115003.

      (14) R. Cristescu, Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes, Nature Medicine (2015).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper performed a functional analysis of the poorly characterized pseudo-phosphatase Styxl2, one of the targets of the Jak/Stat pathway in muscle cells. The authors propose that Styxl2 is essential for de novo sarcomere assembly by regulating autophagic degradation of non-muscle myosin IIs (NM IIs). Although a previous study by Fero et al. (2014) has already reported that Styxl2 is essential for the integrity of sarcomeres, this study provides new mechanistic insights into the phenomenon. In vivo studies in this manuscript are compelling; however, I feel the contribution of autophagy in the degradation of NM IIs is still unclear.

      Major concerns:

      1) The contribution of autophagy in the degradation of Myh9 is still unclear to this reviewer.

      It has been reported that autophagy is dispensable for sarcomere assembly in mice (Cell Metab, 2009, PMID; 1994508). In Fig. 7A, the authors showed that overexpressed Styxl2 downregulated the amount of ectopically expressed Myh9 in an ATG5-dependent manner in C2C12 cells; however, the experiment is far from a physiological condition. Therefore, the authors should test ATG5 knockdown and the genetic interaction between Styxl2 and ATG5 in vivo. That is, 1) loss of ATG5 on sarcomere assembly in zebrafish, and 2) the genetic interaction between Styxl2 and ATG5; co-injection of Styxl2 mRNA and ATG5-MO into the zebrafish embryos.

      Our response: In fact, the reference cited by the reviewer (Cell Metab, 2009; PMID; 19945408) clearly indicated that autophagy is required for sarcomere assembly. Moreover, another paper using the fish extraocular muscle regeneration model (Autophagy, 2014, PMID: 27467399), also showed that the sarcomere structure was disrupted in the regenerated muscles when autophagy was inhibited by chloroquine. In addition, other references (Nature medicine, 2007, PMID: 17450150; Autophagy, 2010, PMID: 20431347) also showed that loss of Atg5 in mouse cardiac muscles led to disorganized sarcomere structure. We also performed the Atg5 knockdown experiments as suggested by the reviewer. However, the sarcomere structure defects were not so obvious as Styxl2 knockdown (see Author response image 1 below). In fact, it was reported that Atg5 knockdown may not be a desirable strategy to disrupt autophagy as it was found “--- only a small amount of Atg5 is needed for autophagy, knockdown of Atg5 to levels low enough to block autophagy might be difficult to achieve, --” (Nature medicine, 2007, PMID: 17450150). Due to the ineffectiveness of the Atg5 MO in our assays, we did not perform the second experiment suggested by the reviewer. Moreover, as Styxl2 is not a key component of the autophagy machinery, it is less likely that overexpression of Styxl2 alone can rescue the autophagy defects caused by Atg5.

      Author response image 1.

      The fish zygotes were injected with Atg5 or Ctrl MO. 48 hpf, the fish were stained with an anti-Actinin antibody. Some fast muscle fibers were disrupted when Atg5 was knocked down. The number in numerator at the bottom of each image represents fish embryos showing normal Actinin staining pattern, while that in denominator represents the total number of embryos examined. Scale bar, 10 µm.

      2) As referenced, Yamamoto et al. reported that Myh9 is degraded by autophagy. Mechanistically, Nek9 acts as an autophagic adaptor that bridges Atg8 and Myh9 through interactions with both. Inconsistent with the model, the authors mentioned on page 12, lines 365-367, "A recent report showed that Myh9 could also undergo Nek9-mediated selective autophagy (Yamamoto et al., 2021), suggesting that Myh9 is ubiquitinated". I think it is not yet explored whether autophagic degradation of Myh9 requires its ubiquitination. Moreover, I cannot judge whether Myh9 is ubiquitinated in a Styxl2-dependent manner from the data in Fig. 7C. The author should test whether Nek9 is required for Myh9 degradation in muscles. If Nek plays a role in the Myh9 degradation, it would be better to remove Fig. 7C.

      Our response: Indeed, as pointed out by the reviewer, it has not been explored whether Myh9 is ubiquitinated or not. However, it has been well-established that some proteins undergoing autophagic degradation are ubiquitinated, which are linked to Atg8/LC3 via p62 and NBR1 (Mol Cell, 2009, PMID: 19250911; J Biol Chem, 2007, PMID: 17580304). To improve the data quality, we repeated the Myh9 ubiquitination experiment in cells with or without Styxl2 by using a slightly different strategy: as shown in the revised Figure 7C, we first co-transfect HEK 293T cells with HA-Myh9, Myc-ubiquitin, and Flag-Styxl2. We then immunoprecipitated Myc-tagged Ubiquitin from the whole cell lysates, and then blot for HAMyh9. We detected an obvious increase in Ubiquitin-conjugated HA-Myh9 (revised Figure 7C). As suggested by the reviewer, we also tested whether knockdown of Nek9 affects the degradation of Myh9. We failed to detect an obvious effect (see Author response image 2 below) caused by Nek9 knockdown. One possible explanation for this negative result is that Nek9 itself is a negative regulator of selective autophagy (J Biol Chem, 2020, PMID: 31857374). By knocking it down, the functions of the autophagy machinery are expected to be enhanced instead of being impaired. This may explain why we failed to detect an effect on Myh9 degradation simply by knocking down Nek9. To further elucidate whether Nek9 is involved in Myh9 degradation in myoblasts, we may need to use a dominant-negative mutant of Nek9 missing the LCIII-binding motif as shown by Yamamoto (Nat Commun, 2021, PMID: 34078910). This will be addressed in our future study.

      Author response image 2.

      C2C12 cells were transfected with negative control siRNA (NC), siNek9#2 or siNek9#3. 18 h later, the cells were transfected with plasmids HA-Myh9 and Flag-Styxl2 or Flag-Stk24. After another 24 h, the cells were harvested for RT-qPCR (left panel) or western blot (right panel).

      3) In Fig. 5F, the protein level of Styxl2 and Myh10 should be checked because the efficiency of Myh10-MO was not shown anywhere in this manuscript.

      Our response: As suggested by the reviewer, a Western blot showing the protein levels of Myh10 was shown in Figure 5-figure supplement 1B.

      Reviewer #2 (Public Review):

      The authors investigated the role of the Jak1-Stat1 signaling pathway in myogenic differentiation by screening the transcriptional targets of Jak1-Stat1 and identified Styxl2, a pseudophosphatase, as one of them. Styxl2 expression was induced in differentiating muscles. The authors used a zebrafish knockdown model and conditional knockout mouse models to show that Styxl2 is required for de novo sarcomere assembly but is dispensable for the maintenance of existing sarcomeres. Styxl2 interacts with the non-muscle myosin IIs, Myh9 and Myh10, and promotes the replacement of these non-muscle myosin IIs by muscle myosin IIs through inducing autophagic degradation of Myh9 and Myh10. This function is independent of its phosphatase domain.

      A previous study using zebrafish found that Styxl2 (previously known as DUSP27) is expressed during embryonic muscle development and is crucial for sarcomere assembly, but its mechanism remains unknown. This paper provides important information on how Styxl2 mediates the replacement of non-muscle myosin with muscle myosin during differentiation. This study may also explain why autophagy deficiency in muscles and the heart causes sarcomere assembly defects in previous mouse models.

      Reviewer #3 (Public Review):

      Wu and colleagues are characterising the function of Styxl2 during muscle development, a pseudo-phosphatase that was already described to have some function in sarcomere morphogenesis or maintenance (Fero et al. 2014). The authors verify a role for Styxl2 in sarcomere assembly/maintenance using zebrafish embryonic muscles by morpholino knockdown and by a conditional Styxl2 allele in mice (knocked-out in satellite cells with Pax7 Cre).

      Experiments using a tamoxifen inducible Cre suggest that Styxl2 is dispensable for sarcomere maintenance and only needed for sarcomere assembly.

      BioID experiments with Styxl2 in C2C 12 myoblasts suggest binding of nonmuscle myosins (NMs) to Styxl2. Interestingly, both NMs are downregulated when muscles differentiate after birth or during regeneration in mice. This down-regulation is reduced in the Styxl2 mutant mice, suggesting that Styxl2 is required for the degradation of these NMs.

      Impressively, reducing one NM (zMyh10) by double morpholino injection in a Styxl2 morphant zebrafish, does improve zebrafish mobility and sarcomere structure. Degradation of Mhy9 is also stimulated in cell culture if Styxl2 is co-expressed. Surprisingly, the phosphatase domain is not needed for these degradation and sarcomere structure rescue effects. Inhibitor experiments suggest that Styxl2 does promote the degradation of NMs by promoting the selective autophagy pathway.

      Strengths:

      A major strength of the paper is the combination of various systems, mouse and fish muscles in vivo to test Styxl2 function, and cell culture including a C2C12 muscle cell line to assay protein binding or protein degradation as well as inhibitor studies that can suggest biochemical pathways.

      Weakness:

      The weakness of this manuscript is that the sarcomere phenotypes and also the western blots are not quantified. Hence, we rely on judging the results from a single image or blot. Also, Styxl2 role in sarcomere biology was not entirely novel.

      Few high resolution sarcomere images are shown, myosins have not been stained for.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      4) The position of molecular weight markers should be shown in all Western blot data.

      Our response: As suggested by the reviewer, the molecular weight markers have been added in the Western blot data.

      5) Schematic models of Styxl2deltaN509 and N513 construct would be helpful for the readers.

      Our response: A schematic has been added in Figure 6B (upper panel) to show Styxl2deltaN509 and Styxl2N513.

      6) Several data were described but not shown (data not shown). I think the data need to be included in the main or supplemental figures.

      Our response: As suggested by the reviewer, the raw data were now included in the Figure 6-figure supplement 1A and Figure 7-figure supplement 1.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 5E, the authors suggest that the needle touch response was improved by additional knockdown of Myh10. This is a bit confusing because the germline knockout of Myh10 is lethal (line 445). The authors should provide more explanation on this point. Additionally, it would be better to include Myh10-MO in Fig. 5E.

      Our response:<br /> In line 445 of our original manuscript, we stated that germline knockout of mouse Myh10 gene is lethal based on a published report (Proc Natl Acad Sci USA, 1997, PMID: 9356462). Here, in zebrafish zygotes, we only knocked down zMyh10, thus, we do not expect to get a lethal phenotype. In addition, other groups who knocked down Myh10 in fish also did not get a lethal phenotype (Dev Biol, 2015, PMID: 25446029). As to the control involving Myh10MO in the experiment in Fig.5E, we did include it in our experiments. As we did not observe any obvious effects on either motility or sarcomere structures, we did not include the data set in the figure.

      2) It was suggested that Myh9 and Myh10 form a complex (Rao et al. PLoS One 9, e114087, 2014). Thus, the IP experiments do not rule out the possibility that Styxl2 directly interacts with either Myh9 or Myh10 and indirectly with the other.

      Our response: In known myosin-II complexes, different myosin molecules can associate with each other through their tail domains (Bioarchitecture, 2013, PMID: 24002531). Thus, if we use fulllength myosin molecules in our co-immunoprecipitation assays, it will be difficult to exclude the possibility raised by the reviewer. However, by using truncated myosin proteins, we showed that the head domain of either Myh9 or Myh10 could interact with Styxl2 in the absence of the tail domain (Figure 4E, F). This result strongly suggests that both Myh9 and Myh10 can independently interact with Styxl2.

      Reviewer #3 (Recommendations For The Authors):

      1) The western blot shown in Figure 3B supporting the induced deletion of Styxl2 should be quantified. Ideally, some other blots, e.g., in Figure 5, too. Please add the age of the mice in Figure 5B to the figure legend.

      Our response:<br /> As suggested by the reviewer, we quantified the data in Figures.3B, 3F, 5B, 5D, and 7A and the data were included in the revised figures. In Fig.5B, we already indicated the age of the mice (i.e., P1) in the legend.

      2) A quantification of the sarcomere phenotypes in the double knock-down of zMyh10 and Styxl2 compared to Styxl2 single would make the paper significantly stronger. Furthermore, a double morpholino control should be included to rule out any RNAi machinery 'dilution effect'.

      Our response: As suggested by the reviewer, we quantified the sarcomere structures using the line scan analysis in ImageJ and the scan images were placed as inserts in the upper corner of the immunofluorescent images (revised Figures 5F, and 6C). To avoid potential “dilution effects”, in all the experiments involving the use of two different MOs, the total amount of MO was kept the same in all control samples by including a control MO (e.g., in samples treated with one specific MO, an equal amount of a control MO was also included, while in samples without any specific MO, twice as much control MO was used).

      3) The sarcomere phenotypes in figure 6 should also be better quantified, for example using simple line scans of the alpha-actinin stains and assay periodicity or calculating the autocorrelation coefficients. How about myosin stains?

      Our response: We quantified Figure 6C as suggested by the reviewer. We also performed myosin staining. The results were similar to that shown by the a-actinin antibody (see revised Figure 6-Fig supplement 1B).

      4) Do the authors see periodic NMs patterns in developing mouse muscle fibers as indicated by the model in in in figure 7D? It is unclear if nonmuscle myosin is present in a PERIODIC pattern in early myofibrils. NM myosin periodic patterns that have been observed have a periodicity of only about 1 µm fitting the shorter length of the NM bipolar filaments (about 300 nm only, PMID 28114270).

      Our response: The reviewer raised a good point here. Ideally, we should examine developing mouse muscle fibers to prove that NM shows periodic patterns. However, due to the difficulty in catching myocytes undergoing sarcomere assembly, the majority of the studies involving NM in sarcomeres use cultured cardiomyocytes. Using TA muscles from P1 new-born mice, we failed to detect the presence of NM in sarcomeres (see Author response image 3 below). Actually, nearly all the myofibers showed mature sarcomere pattern without the NM signal. More work is needed in the future to examine developing mouse fibers at different embryonic stages to look for the presence of NM in developing sarcomeres.

      Author response image 3.

      The TA muscles were collected from male and female P1 mice. The muscles were sectioned and co-stained for a-actinin (Actn) and Myh9. The majority of myofibrils is mature without the NM II signal. Scale bar, 10 µm.

      5) Recent work suggested that mechanical tension is key to assemble the first long periodic myofibril containing immature sarcomeres. Tension is likely produced by a combination of NM and Mhc in the assembling sarcomeres themselves. This could be included in the introduction or discussion (PMIDs 24631244, 29316444, 29702642, 35920628).

      Our response: We thank the reviewer for pointing to us additional relevant references. We have added them in the Introduction.

      6) I suggest replacing "sarcomeric muscles" with "striated muscles".

      Our response: We revised the term in the manuscript as suggested by the reviewer.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to the Referee Comments We would like to express our appreciation to the editor and the reviewers for their thoughtful comments and constructive suggestions on the manuscript. We agree with most of the comments and have carefully revised the manuscript accordingly. The revisions are highlighted in red font in the revised manuscript. Below are point-by-point responses to the referee’s comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Microglia are increasingly recognized as playing an important role in shaping the synaptic circuit and regulating neural dynamics in response to changes in their surrounding environment and in brain states. While numerous studies have suggested that microglia contribute to sleep regulation and are modulated by sleep, there has been little direct evidence that the morphological dynamics of microglia are modulated by the sleep/wake cycle. In this work, Gu et al. applied a recently developed miniature two-photon microscope in conjunction with EEG and EMG recording to monitor microglia surveillance in freely-moving mice over extended period of time. They found that microglia surveillance depends on the brain state in the sleep/wake cycle (wake, non-REM, or REM sleep). Furthermore, they subjected the mouse to acute sleep deprivation, and found that microglia gradually assume an active state in response. Finally, they showed that the state-dependent morphological changes depend on norepinephrine (NE), as chemically ablating noradrenergic inputs from locus coeruleus abolished such changes; this is in agreement with previous publications. The authors also showed that the effect of NE is partially mediated by β2-adrenergic receptors, as shown with β2-adrenergic receptor knock-out mice. Overall, this study is a technical tour de force, and its data add valuable direct evidence to the ongoing investigations of microglial morphological dynamics and its relationship with sleep. However, there are a number of details that need to be clarified, and some conclusions need to be corroborated by more control experiments or more rigorous statistical analysis. Specifically:

      1. The number of branch points per microglia shown here (e.g., Fig. 2g) is much lower than the values of branch points in the literature, e.g., Liu T et al., Neurobiol. Stress 15: 100342, 2021 (mouse dmPFC, IHC); Liu YU et al., Nat. Neurosci. 22: 1771-81, 2019 (mouse S1, in vivo 2P imaging). The authors need to discuss the possible source of such discrepancy.

      Thank you for raising this important point. Two reasons may account for this difference. Firstly, the difference in the definition of branch points in the software. Liu YU et al. used the Sholl analysis of image J software to analyze the number of branch points of microglia. Sholl analysis defines the number of branch points as the number of crossings between branches and concentric circles of increasing radii. We reconstructed microglia morphology using Imaris, a software that defines branching points based on the number of bifurcation points. The number of bifurcations calculated represents the number of microglia branch points. Secondly, this and previous studies found that more branching points present in the state of anesthesia. The morphological characteristics of microglia in head-fixed mice under anesthesia was reported by Liu T et al. and the microglia reconstruction results presented by the authors are indeed more complex than ours. In short, this is an aspect that we have been paying attention to, and the main reasons for this difference may lie in the definition of branch points, analysis methods and related choice of thresholds. True differences in brain states and the heterogeneity of microglia in different brain regions may also contribute to the apparent discrepancy.

      1. Microglia process end-point speed (Fig. 2h, o): here the authors show that the speed is highest in the wake state and lowest in NREM, which agrees with the measurement on microglia motility during wakefulness vs NREM in a recent publication (Hristovska I et al., Nat. Commun. 13: 6273, 2022). However, Hristovska et al. also reported lower microglia complexity in NREM vs wake state, which seems to be the opposite of the finding in this paper. The authors need to discuss the possible source of such differences.

      This is also an important point. Hristovska et al. reported the morphodynamic characteristics of microglia during wakefulness and NREM sleep. It is worth noting that the sleep state of the mice in their experiments was unnatural due to the head fixation and body limitations, the duration of NREM sleep (sleep stability) being quite different from the NREM sleep analyzed under natural sleep. The limitations of this approach are also discussed by Hristovska et al. “Even though sleep episodes were, as anticipated, shorter than those observed in freely moving animals, changes in neuronal activity characteristic of NREM sleep were monitored by EEG recordings, and changes in morphodynamics were observed during single episodes. Several episodes of REM sleep were detected, but they were too short and rare to be analyzed reliably.” The unnatural sleep state would lead to an increase in the microarousal state, and ultimately a change in the structure of the sleep state, which may be the main reason for the difference in microglia behavior from our natural sleep. We have discussed this in the revised manuscript. Please see line 292298.

      1. Fig. 3: the authors used single-plane images to analyze the morphological changes over 3 or 6 hours of SD, which raises the concern that the processes imaged at the baseline may drift out of focus, leading to the dramatic reduction in process lengths, surveillance area, and number of branch points. In fact, a previous study (Bellesi M et al., J. Neurosci. 37(21): 5263-73, 2017) shows that after 8 h SD, the number of microglia process endpoints per cell and the summed process length per cell do not change significantly (although there is a trend to decline). The authors may confirm their findings by either 3D imaging in vivo, or 3D imaging in fixed tissue.

      Three lines of evidence indicate that microglia morphology changes in Fig 3 are due to SD, rather than variations in the focal plane. First, our single-plane images were quite stable over 3 or 6 hours of SD, though occasional reversible drifts might happen due to sudden motions. Second, per your suggestion, further experiments and analysis of 3D imaging were performed to monitor microglia dynamics during sleep deprivation. The new result is shown in revised Fig. S3 C-D: the length of microglia branches and the number of branching points were significantly reduced after SD, in agreement with the results of single-plane imaging. Furthermore, we detected no significant difference in microglia branching characteristics during 6h sleep deprivation in 2AR KO mice (Fig.S4), and this indirectly affirmed that singleplane imaging is stable enough for detecting true changes in branching during SD.

      1. Fig. 4b: the EEG and EMG signals look significantly different from the example given in Fig. 2a. In particular, the EMG signal appears completely flat except for the first segment of wake state; the EEG power spectrum for REM appears dark; and the wake state corresponds to stronger low frequency components (below ~ 4 Hz) compared to NREM, which is the opposite of Fig. 2a. This raises the concern whether the classification of sleep stage is correct here.

      Thank you for insightful comments. We carefully examined the behavioral video of Figure 4b, there were occasionally microarousal events indicated by slow head rotation during NREM sleep, while the companion EMG signals were completely flat, which is atypical during sleep wake cycle. The microarousal events were not excluded from sleep, which makes this set of data unrepresentative and contrary to Fig.4b. In our revised manuscript, we replaced it with more representative data that can clearly and consistently distinguish between different brain states in mice on EMG and EEG. Please see revised Fig.2a, page 34; revised Fig.4b, page 37.

      1. Fig. 4 NE dynamics. • How long is a single continuous imaging session for NE? • When monitoring microglia surveillance, the authors were able to identify wake or NREM states longer than 15 min, and REM states longer than 5 min. Here the authors selected wake/NREM states longer than 1 min and REM states longer than 30 s. What makes such a big difference in the time duration selected for analysis? • Also, the definition of F0 is a bit unclear. Is the same F0 used throughout the entire imaging session, or is it defined with a moving window?

      A single continuous session of NE imaging usually took about 1 hour. Subsequent analysis was performed on imaging data from each recording that included wake, NREM sleep, and REM sleep. Because of the different time scales of microglia morphological dynamic (relatively slow) and NE signals (fast), we used different time windows in the previous analysis in the previous version of the manuscript.

      Per your suggestion, we have now set the same time window selection criteria for both microglia morphological and NE dynamic analysis: for wake and NREM sleep durations longer than 1 minute, and REM sleep durations longer than 30 seconds. We updated the Methods and all statistics in related figures, please see line 151-154, 481-485, 490-492; Fig. 2e-g and 2l-n, page 34. F0 definition is now explained in the Methods section. Please see line 521-522.

      1. Fig. 5b: how does the microglia morphology in LC axon ablation mice compare with wild type mice under the wake state? The text mentioned "more contracted" morphology but didn't give any quantification. Also, the morphology of microglia in the wake state (Fig. 5b) appears very different from that shown in Fig. S3C1 (baseline). What is the reason?

      The morphology of microglia is indeed heterogeneous and variable, affected by factors including brain state, brain region, microenvironmental changes, along with animal-to-animal difference. We didn’t perform the microglia morphology comparison between the LC axon ablation mice and wild type mice and, in view of this, we removed the description of “more contracted morphology” from the main text. It should also be noted that, as we primarily focused on changes of a microglia in different states over time by selfcomparison, we minimized possible effects of heterogeneity in microglia morphology on our conclusions.

      1. The relationship between NE level and microglia dynamics. Fig. 4C shows that the extracellular NE level is the highest in the wake state and the lowest in REM. Previous studies (Liu YU et al., Nat. Neurosci. 22(11):1771-1781, 2019; Stowell RD et al., Nat. Neurosci. 22(11): 1782-1792, 2019) suggest that high NE tone corresponds to reduced microglia complexity and surveillance. Hence, it would be expected that microglia process length, branch point number, and area/volume are higher in REM than in NREM. However, Fig. 2l-n show the opposite. How should we understand this ?

      Your point is well-taken. On the one hand, our data clearly showed that NE is critically involved in the brain state-dependent microglia dynamic surveillance, with evidence from the ablation of the LC-NE projection and from the β2AR knockout animal model.

      On the other hand, we also understand that NE is not the sole determinant, so the relationship between the NE level and the complexity and surveillance may not be unique.

      In this regard, other potential modulators also present dynamic during sleepwake cycle and may partake in the regulation of microglia dynamic surveillance. previous studies (Liu YU et al., 2019; Stowell RD et al., 2019) have shown that microglia can be jointly affected by surrounding neuronal activity and NE level during wake. It has been reported that LC firing stops (Aston-Jones et al., 1981; Rasmussen et al., 1986), while inhibitory neurons, such as PV neurons and VIP neurons, become relatively active during REM sleep (Brécier et al., 2022). ATP level in basal forebrain is shown to be higher in REM than NREM (Peng et al., 2023). In addition, our own preliminary result (Author response image 1) also showed a higher adenosine level in REM than NREM in somatosensory cortex. Last but not the least, we found that β2AR knockout failed to abolish microglial responses to sleep state switch and SD stress altogether.

      In brief, microglia are highly sensitive to varied changes in the surrounding environment, and many a modulator may participate in the microglia dynamic during sleep state. This may underlie the microglia complexity difference between REM and NREM. Future investigations are warranted to delineate the signal-integrative role of microglia in physiology and under stress. We have discussed the pertinent points in the revised manuscript. Please see line 343-354.

      Author response image 1.

      Extracellular adenosine levels in somatosensory cortex in different brain states. AAV2/9-hSyn-GRABAdo1.0 (Peng W. et al., Science. 2020) was injected into the somatosensory cortex (A/P, -1 mm; M/L, +2 mm; D/V, -0.3 mm). Data from the same recording are connected by lines. n = 9 from 3 mice.

      Reviewer #2 (Public Review):

      The manuscript describes an approach to monitor microglial structural dynamics and correlate it to ongoing changes in brain state during sleep-wake cycles. The main novelty here is the use of miniaturized 2p microscopy, which allows tracking microglia surveillance over long periods of hours, while the mice are allowed to freely behave. Accordingly, this experimental setup would permit to explore long-lasting changes in microglia in a more naturalistic environment, which were previously not possible to identify otherwise. The findings could provide key advances to the research of microglia during natural sleep and wakefulness, as opposed to anesthesia. The main findings of the paper are that microglia increase their process motility and surveillance during REM and NREM sleep as compared to the awake state. The authors further show that sleep deprivation induces opposite changes in microglia dynamics- limiting their surveillance and size. The authors then demonstrate potential causal role for norepinephrine secretion from the locus coeruleus (LC) which is driven by beta 2 adrenergic receptors (b2AR) on microglia. However, there are several methodological and experimental concerns which should be addressed.

      The major comments are summarized below:

      1. The main technological advantage of the 2p miniaturized microscope is the ability to track single cells over sleep cycles. A main question that is unclear from the analysis and the way the data is presented is: are the structural changes in microglia reversible? Meaning, could the authors provide evidence that the same cell can dynamically change in sleep state and then return to similar size in wakefulness? The same question arises again with the data which is presented for anesthesia, is this change reversible?

      As revealed by long-term free behavioral mTPM imaging, the brain-statedependent morphological changes in microglia were reproducible and reversible. Author response image 2 shows that microglia displayed reversible dynamic changes during multiple rounds of sleep-wake transition. Author response image 3 shows that microglia dynamics induced by anesthesia also exhibited reversibility.

      Author response image 2.

      Long-term tracking of microglia process area in different brain states. Data analysis used 8 cells. Data total of 31 time points were selected from in vivo imaging data and were used to characterize the morphological changes of microglia over a continuous 7-hour period.

      Author response image 3.

      Reversible changes of microglial process length, area, number of branch points under anesthesia. Wake group: 30 minute-accommodation to new environment; Isoflurane group: 1.5% in air applied at a flow rate of 0.4 L/min for 30 minutes; Recovery group: 30 minutes after recovery from anesthesia. n = 9 cells from 3 mice for each group.

      1. The binary comparison between brain states is misleading, shouldn't the changes in structural dynamics compared to the baseline of the state onset? The authors method describes analysis of the last 5 minutes in each sleep/wake state. However, these transitions are directional- for instance, REM usually follows NREM, so the description of a decrease in length during REM sleep could be inaccurate.

      As you know, the time scale of microglia morphological dynamic is relatively slow, so we analyzed the microglia morphological dynamic of the last part (30s in the revised manuscript) of each state instead of the state onset, allowing time for stabilization of the microglia response to inter-state transition.

      Further, we compared microglia dynamic between two NREM groups transiting to different subsequent states: group1 (NREM to REM) vs group2 (NREM to Wake). This precaution was to exclude the directional effect of state transitions. Our results showed that there was no difference in microglial length, area, number of branching points between the two NREM groups (Author response image 4), indicating that the last 30s of each NREM was not affected by its following state and that it’s reasonable to perform binary comparison.

      Author response image 4.

      Microglial morphological length, area change, and number of branch points of the last 30s of NREM sleep followed by REM or Wake. n = 9 cells from 3 mice for each group.

      1. Sleep deprivation- again, it is unclear whether these structural changes are reversible. This point is straightforward to address using this methodology by measuring sleep following SD. In addition, the authors chose a method to induce sleep deprivation that is rather harsh. It is unclear if the effect shown is the result of stress or perhaps an excess of motor activity.

      We adopted the method of forced exercise as it has been commonly used for sleep deprivation (Pandi-Perumal et al., 2007; Nollet M et al., 2020), though it does have the potential limitation of excess of motor activity.

      In light of your comments and suggestion, we presented new data demonstrating that sleep duration of the mice, mostly NREM sleep, increased compensatively (ZT9-10) after the 6-hour sleep deprivation (ZT2-8) (revised Fig. S3B). This result shows that sleep deprivation indeed increase sleep pressure in the mice. As the sleep pressure was eased during recovery sleep, morphological changes of microglia were reversed over a timescale of several hours (revised Fig. S3 E-J).

      1. The authors perform measurements of norepinephrine with a recently developed GRAB sensor. These experiments are performed to causally link microglia surveillance during sleep to norepinephrine secretion. They perform 2p imaging and collect data points which are single neurons, and it is unclear why the normalization and analysis is performed for bulk fluorescence similar to data obtained with photometry.

      We did not perform single-neuron analysis for two reasons. First, our experimental conditions, e.g., the expression of the NE indicator and the control of imaging laser intensity, did not yield sufficient signal-to-noise to clearly discriminate individual neurons with two-photon imaging. Second, NE signal may play a modulatory role, and fluorescence changes appeared to be global, rather than local or cell-specific. Therefore, we analyzed fluorescence changes in different brain states over the whole field-of-view in Fig. 4, rather than at the subregional or single-cell level.

      1. The experiments involving b2AR KO mice are difficult to interpret and do not provide substantial mechanistic insight. Since b2AR are expressed throughout numerous cell types in the brain and in the periphery, it is entirely not clear whether the effects on microglia dynamics are direct. The conclusion and the statement regarding the expression of b2AR in microglia is not supported by the references the authors present, which simply demonstrate the existence and function of b2AR in microglia. In addition, these mice show significant changes in sleep pattern and increased REM sleep. This could account for reasons for the changes in microglia structure rather than the interpretation that these are direct effects.

      To summarize, the main conclusions of the paper require further support with analysis of existing data and experimental validation.

      Previous studies have revealed that norepinephrine (NE) has a modulating effect on microglial dynamics through β2AR pathway (Stowell RD et al., 2019; Liu YU et al., 2019). Stowell et al. and Liu et al. use in vivo two-photon imaging to demonstrate that microglia dynamics differ between awake and anesthetized mice and to highlight the roles of NE and β2AR in these states (Gyoneva S et al., 2013; Stowell RD et al., 2019; Liu YU et al., 2019). To evaluate the direct effect of β2AR on microglial dynamics, Stowell et al. administered the β2AR agonist clenbuterol to anesthetized mice and found that this decreased the motility, arbor complexity, and process coverage of microglia in the parenchyma (Stowell RD et al., 2019). Inhibition of β2AR by antagonist ICI-118,551 in awake mice recapitulated the effects of anesthesia by enhancing microglial arborization and surveillance (Stowell RD et al., 2019). In addition, it has been shown microglia expressed higher numbers of β2ARs than any other cells in the brain (Zhang et al., 2014).

      To this end, our current work provided new evidence to support the involvement of the LC-NE-β2AR axis in modulating microglia dynamics both during natural sleep-wake cycle and under SD stress. While we were aware the limitation of using pan-tissue β2AR knockout model that precluded us from pinpointing role of microglial β2AR, it is safe to state that β2-adrenergic receptor signaling plays a significant role in the sleep-state dependent microglia dynamic surveillance, based on the present and previous data.

      We have discussed this in the revised manuscript. Please see line 324-354. As you suggested, we added references to support the statement regarding the expression of β2AR in microglia (please see line 333).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Some technical details need to be clarified. Also, please double-check for typos.

      1. In vivo imaging preparation: how long is the recovery time between window/EEG implantation surgery and imaging/recording?

      Imaging data were collected one month after the surgery. We have added descriptions to the methods section of the revised manuscript. Please see line 419.

      1. Statistical analysis: the authors used t-test or ANOVA without first checking whether the data pass the normality test. If the data does not follow a normal distribution, nonparametric tests would be more appropriate.

      Per your suggestion, we performed the test of statistical significance using parametric (ANOVA) if past the normality test, or the non-parametric (Friedman) tests for non-normal data. Please see line 533-535.

      1. Fig. 1b needs a minor change. In the figure, the EMG electrodes appear to be connected to the brain as well.

      We have corrected this oversight. Thank you.

      1. Fig. 1c: it would be helpful to give examples of raw EEG and EMG traces for REM and NREM separately.

      Raw traces are now shown as suggested. Please see Fig. 1c, page 32.

      1. Fig. 1h: is each data point one microglia or one end-point?

      In Fig. 1h, each data represents the average speed of all branches of one microglia, not one end-point.

      1. Sleep deprivation starts at 9 am. What time corresponds to Zeitgeber Time 0 (ZT0, the beginning of the light phase)?

      We now clarified that 9 am corresponds to Zeitgeber time 2. Please see line 196.

      1. Line 61: the authors referred to Ramon y Cajal's original suggestion that microglia dynamics are coupled to the sleep-wake cycle. However, the cited paper only indicates that Cajal suggested a role of astrocytes in the sleep-wake cycle, not microglia. In addition, there is a typo in the line: there should be a space between "Ramon" and "y" in Cajal's name.

      We have updated the statement and reference literature to point out the microglia’s involvement in the sleep-wake cycle. The typo was corrected. Please see line 64-65.

      1. Fig. S3B: As each group has only 3 mice, it is unclear how t-test can yield p < 0.01 or even 0.001.

      We checked the original data again and it was correct. This small p-values may be due to the small intra-group difference of control group.

      1. Line 251-253, "Figure 4h-n" should be "Figure 5h-n"?

      We have revised it. Please see line 265-266.

      1. Fig. 5h: the receptor should be "adrenergic receptor", not "adrenal receptor".

      We changed the term to “adrenergic receptor”. Please see Fig 5h.

      1. Fig. 5g, n: the number of data points is apparently less than the sample size given in the figure legend. Perhaps some data points have exactly the same value so they overlap? The authors may consider plotting identical values with a slight shift so that the number of data points shown matches the actual sample size, to avoid confusion.

      Yes, we have added small jitters so different data points can be seen to avoid confusion. Please see Fig. 5n.

      1. There are some typos (e.g., Line 217, "he" should be "the") and some incomplete references (e.g., [13], [22], [34], [35] lack volume and page number, [15] and [39] lack publisher information). Some references have inconsistent formats (e.g., "Journal of Neuroscience" is sometimes abbreviated and sometimes not). Please correct these.

      We have corrected these oversights. Please see references, page 27.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      1. Re-analyze the data in a manner that allows to follow and compare the same cells over different state transitions. This is necessary to evaluate the reversibility of microglia structure. In addition, consider analysis of the change from the beginning to the end of each state.

      As shown in response figure 2, microglia dynamics were reversible during multiple rounds of sleep-wake transition.

      1. It would be nice to see the raw data obtained over time, at least for Figure 1, before offline correction of movement to evaluate the imaging quality and level of drift during imaging.

      We agree to your good suggestion. Please see the supporting material video.

      1. It would be helpful to add an analysis of the percent time spent in each state for the 10 hour recordings.

      Advice has been adopted. Please see revised Fig. S4C.

      1. In Figure 2 the results are from 15 cells from several animals. How much do the results vary between mice? It will be helpful to show if this varies between different mice by labeling cells from each mouse differently.

      In Author response image 5, in which we have labeled the distribution of data points from seven mice, there was mixed distribution of data from different animals at each brain state, but no clear animal-to-animal difference.

      Author response image 5.

      Quantitative analysis of microglial length based on multi-plane microglial imaging. n = 17 cells from 7 mice for each group. In right panel, each color codes data from the same animal.

      1. SD- please add some quantification for sleep and EEG to show that the manipulation really caused sleep deprivation. To address the confound of forced movement and stress, it might be helpful to add quantification of movement compared to an undisturbed wakefulness.

      We have added related data (revised Fig. S3B), as suggested. Please see line 196-197.

      1. The DSP4 application should be also performed with NE measurements to verify the specific of the NE signal measured as well as the DSP4 toxin.

      Following your suggestion, we have added DSP4 data in revised Fig. S4B.

      1. Some suggested refined experiments for the b2AR KO are: a-A conditional b2AR KO in microglia, as cited in the work. b- Local application of a b2 blocker during SD. c- Imaging of NE dynamics in the b2 animals. If NE dynamics during natural sleep cycle are perturbed, then this suggests upstream mechanisms rather than direct microglia effects as suggested by the authors.

      We agree that the current study cannot pinpoint a direct effect of microglia harbored β2AR. We have discussed this limitation in the revised manuscript.

      Please see line 324-354.

      Minor:

      1. Typo on page 4 (microcopy instead of microscopy).

      It was corrected. Please see line 87.

      1. Typo page 11- 'and he largest changes in NE' - supposed to be 'the'.

      We have corrected these mistakes. Please see line 228.

      1. Fig. 4- there are several units missing in the figure in panel b: the top is Hz, but what does the color bar indicate exactly? 2 what? both for theta/delta and for NE. We have modified this figure and legend for clarity. Please see Fig. 4, page 37.

      2. Bottom of page 12- referring to figure 4 but talking about figure 5.

      The typo was corrected. Please see line 265-266.

      Reference

      1. Aston-Jones G, Bloom FE. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci. 1, 876–886 (1981).

      2. Bellesi M, de Vivo L, Chini M, Gilli F, Tononi G, Cirelli C. Sleep loss promotes astrocytic phagocytosis and microglial activation in mouse cerebral cortex. J Neurosci. 37, 5263–5273 (2017).

      3. Brécier A, Borel M, Urbain N, Gentet LJ. Vigilance and behavioral state-dependent modulation of cortical neuronal activity throughout the sleep/wake cycle. J Neurosci. 42, 4852–66 (2022).

      4. Dworak M, McCarley RW, Kim T, Kalinchuk AV, Basheer R. Sleep and brain energy levels: ATP changes during sleep. J Neurosci. 30, 9007-16 (2010).

      5. Gyoneva S., Traynelis SF. Norepinephrine modulates the motility of resting and activated microglia via different adrenergic receptors. J Biol Chem. 288, 15291302 (2013).

      6. Kjaerby C, Andersen M, Hauglund N, Untiet V, Dall C, Sigurdsson B, Ding F, Feng J, Li Y, Weikop P, Hirase H, Nedergaard M. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci. 25, 1059–1070 (2022).

      7. Liu T, Lu J, Lukasiewicz K, Pan B, Zuo Y. Stress induces microglia-associated synaptic circuit alterations in the dorsomedial prefrontal cortex. Neurobiology of Stress. 15, 100342 (2021).

      8. Liu YU, Ying Y, Li Y, Eyo UB, Chen T, Zheng J, Umpierre AD, Zhu J, Bosco DB, Dong H, Wu LJ. Neuronal network activity controls microglial process surveillance in awake mice via norepinephrine signaling. Nat Neurosci. 22, 1771–1781 (2019).

      9. Nollet M, Wisden W, Franks NP. Sleep deprivation and stress: a reciprocal relationship. Interface Focus. 10, 20190092 (2020).

      10. Pandi-Perumal SR, Cardinali DP, Chrousos GP. 2007. Neuroimmunology of sleep. New York, NY: Springer.

      11. Peng W, Liu X, Ma G, Wu Z, Wang Z, Fei X, Qin M, Wang L, Li Y, Zhang S, Xu M. Adenosine-independent regulation of the sleep-wake cycle by astrocyte activity. Cell Discov. 9, 16 (2023).

      12. Peng W, Wu Z, Song K, Zhang S, Li Y, Xu M. Regulation of sleep homeostasis mediator adenosine by basal forebrain glutamatergic neurons. Science. 369, 6508 (2020).

      13. Rasmussen K, Morilak DA, Jacobs BL. Single unit activity of locus coeruleus neurons in the freely moving cat: I. During naturalistic behaviors and in response to simple and complex stimuli. Brain Research. 371, 324–334 (1986).

      14. Stowell RD, Sipe GO, Dawes RP, Batchelor HN, Lordy KA, Whitelaw BS, Stoessel MB, Bidlack JM, Brown E, Sur M, Majewska AK. Noradrenergic signaling in the wakeful state inhibits microglial surveillance and synaptic plasticity in the mouse visual cortex. Nat Neurosci. 22, 1782-1792 (2019).

      15. Umpierre AD, Bystrom LL, Ying Y, Liu YU, Worrell G, Wu LJ. Microglial calcium signaling is attuned to neuronal activity in awake mice. Elife. 27, e56502 (2020).

      16. Wang Z, Fei X, Liu X, Wang Y, Hu Y, Peng W, Wang YW, Zhang S, Xu M. REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Commun. 13, 6896 (2022).

      17. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 34, 11929–11947 (2014).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Ritvo and colleagues present an impressive suite of simulations that can account for three findings of differentiation in the literature. This is important because differentiation-in which items that have some features in common, or share a common associate are less similar to one another than are unrelated items-is difficult to explain with classic supervised learning models, as these predict the opposite (i.e., an increase in similarity). A few of their key findings are that differentiation requires a high learning rate and low inhibitory oscillations, and is virtually always asymmetric in nature.

      This paper was very clear and thoughtful-an absolute joy to read. The model is simple and elegant, and powerful enough to re-create many aspects of existing differentiation findings. The interrogation of the model and presentation of the findings were both extremely thorough. The potential for this model to be used to drive future work is huge. I have only a few comments for the authors, all of which are relatively minor.

      (1) I was struck by the fact that the "zone" of repulsion is quite narrow, compared with the zone of attraction. This was most notable in the modeling of Chanales et al. (i.e., just one of the six similarity levels yielded differentiation). Do the authors think this is a generalizable property of the model or phenomenon, or something idiosyncratic to do with the current investigation? It seems curious that differentiation findings (e.g., in hippocampus) are so robustly observed in the literature despite the mechanism seemingly requiring a very particular set of circumstances. I wonder if the authors could speculate on this point a bit-for example, might the differentiation zone be wider when competitor "pop up" is low (i.e., low inhibitory oscillations), which could help explain why it's often observed in hippocampus? This seems related a bit to the question about what makes something "moderately" active, or how could one ensure "moderate" activation if they were, say, designing an experiment looking at differentiation.

      We thank the reviewer for this comment. In the previous version of the manuscript, in the section entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activation Dynamics”, we discussed some reasons why differentiation may be more likely to be found in the hippocampus – namely, the high learning rate of the hippocampus and the sparsity of hippocampal activation patterns (pp. 27-28):

      “These results have implications for where to look for differentiation in the brain. Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus. This explanation implies that disruptions of hippocampal processing (e.g., lesions, stimulation) will eliminate these neocortical differentiation effects; we plan to test this prediction in future work.

      Additionally, the simulations where we adjusted the oscillation amount (using our model of Schlichting et al., 2015) imply that differentiation will be most evident in brain regions where it is relatively hard to activate competitors. Given the U shape of the NMPH learning rule, limiting competitor activity makes it less likely that plasticity will ``cross over'' from weakening (and differentiation) to strengthening (and integration). Thus, within the hippocampus, subregions with sparser activity (e.g., dentate gyrus, and to a lesser extent, CA3; Barnes et al., 1990, GoodSmith et al., 2017; West et al., 1991) will be more prone to differentiation. There is strong empirical support for this prediction. For example, Wammes et al. (2022) manipulated the similarity of stimuli in a statistical learning experiment and found that moderate levels of visual similarity were associated with significant differentiation in the dentate gyrus but not other subregions. Also, numerous studies have found greater differentiation in dentate gyrus / CA3 than in CA1 (e.g., Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Molitor et al., 2021; Kim et al., 2017; but see Zheng et al., 2021).”

      In the revised draft we have supplemented this discussion with a new section entitled “Reconciling the Prevalence of Differentiation in the Model and in the Data” (pp. 30-31):

      “A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      (2) With real fMRI data we know that the actual correlation value doesn't matter all that much, and anti-correlations can be induced by things like preprocessing decisions. I am wondering if the important criterion in the model is that the correlations (e.g., as shown in Figure 6) go down from pre to post, versus that they are negative in sign during the post learning period. I would think that here, similar to in neural data, a decrease in correlation would be sufficient to conclude differentiation, but would love the authors' thoughts on that.

      We thank the reviewer for bringing this up. In the paper, we define differentiation as the moving apart of representations – so we agree with the reviewer that it would be appropriate to conclude that differentiation is taking place when correlations go down from pre to post.

      In addition to the definitional question (“what counts as differentiation”), one can also ask the mechanistic question of what is happening in the model at the (simulated) neuronal level in conditions where differentiation (i.e., an average decrease in similarity from pre to post) occurs. Here, the model’s answer is clear: When the similarity of two pairmates decreases, it is because the pairmates have acquired anticorrelated representations at the (simulated) neuronal level. When similarity decreases on average from pre to post, but the average “post” similarity value is not negative, this is because there is a mix of outcomes across runs of the model (due to variance in the initial, random model weights and also variance in the order in which items are presented across training epochs) – some runs lead to differentiation (manifested as anticorrelated pairmate representations) whereas others lead to no change or integration. The average pre-to-post change depends on the relative frequencies with which these different outcomes occur.

      We have made several edits to the paper to clarify this point.

      We added a new section under “Results” in our simulation of Chanales et al. (2021) entitled, “Pairs of Items that Differentiate Show Anticorrelated Representations” (p. 15):

      “Figure 6B also highlights that, for learning rates where robust differentiation effects occur in aggregate (i.e., there is a reduction in mean pattern similarity, averaging across model runs), these aggregate effects involve a bimodal distribution across model runs: For some model runs, learning processes give rise to anticorrelated representations, and for other model runs the model shows integration; this variance across model runs is attributable to random differences in the initial weight configuration of the model. The aggregate differentiation effect is therefore a function of the proportion of model runs showing differentiation (here, anticorrelation) and the proportion of model runs showing integration. The fact that differentiation shows up as anticorrelation in the model's hidden layer relates to the learning effects discussed earlier:

      Unique competitor units are sheared away from (formerly) shared units, so the competitor ends up not having any overlap with the target representation (i.e., the level of overlap is less than you would expect due to chance, which mathematically translates into anticorrelation). We return to this point and discuss how to test for anticorrelation in the Discussion section.”

      We added new text to the “Take-Home Lessons” section in the Chanales et al. (2021) simulation (p. 17):

      “In particular, the simulations expose some important boundary conditions for when representational change can occur according to the NMPH (e.g., that differentiation depends on a large learning rate, but integration does not), and the simulations provide a more nuanced account of exactly how representations change (e.g., that differentiation driven by the NMPH is always asymmetric, whereas integration is sometimes asymmetric and sometimes symmetric; and that, when differentiation occurs on a particular model run, it tends to give rise to anticorrelated representations in the model's hidden layer).”

      We added new text to the “Nature of Representational Change” section in the Favila et al. (2016) simulation (p. 21):

      “Figure 8 - Supplement 1 also indicates that, as in our simulation of Chanales et al. (2021), individual model runs where differentiation occurs show anticorrelation between the pairmate representations, and gradations in the aggregate level of differentiation that is observed across conditions reflect differences in the proportion of trials showing this anticorrelation effect.”

      We added new text to the “Take-Home Lessons” section in the Favila et al. (2016) simulation (p.21):

      “As in our simulation of \cite{chanales2021adaptive}, we found that the NMPH-mediated differentiation was asymmetric, manifested as anticorrelation between pairmate representations on individual model runs, and required a high learning rate, leading to abrupt representational change.”

      We added new text to the “Nature of Representational Change” section in the Schlichting et al. (2015) simulation (p. 26):

      “Also, as in our other simulations, when differentiation occurs on a particular model run it tends to give rise to anticorrelated representations (results not shown).”

      We added new text to the “Take-Home Lessons” section in the Schlichting et al. (2015) simulation (pp. 26-27):

      “As in the other versions of our model, differentiation requires a high learning rate, and – on model runs when it occurs – it is asymmetric and gives rise to anticorrelated representations.”

      We added new text at the start of the Discussion (p. 27):

      “In addition to qualitatively replicating the results from the studies we simulated, our model gives rise to several novel predictions – most notably, that differentiation driven by the NMPH requires a rapid learning rate and, when it occurs for a particular pair of items, it is asymmetric and gives rise to anticorrelated representations.”

      We also added a new section in the Discussion entitled “Testing the Model's Prediction about Anticorrelation”, which (among other things) highlights the reviewer’s point that fMRI pattern similarity values can be affected by preprocessing choices (p. 30):

      “Even though we operationally define differentiation as a reduction in similarity with learning, the way that it actually shows up on individual model runs is as anticorrelation between pairmates; in the model, the size of the aggregate differentiation effect is determined by the proportion of model runs that show this anticorrelation effect (vs. no change or integration). This implies that, if we could get a clean measurement of the similarity of pairmates in an experiment, we might see a multimodal distribution, with some pairmates showing anticorrelation, and others showing increased correlation (integration) or no change in similarity. This kind of clean readout of the similarity of individual pairs might be difficult to obtain with fMRI; it is more feasible that this could be obtained with electrophysiology. Another challenge with using fMRI to test this prediction is that anticorrelation at the individual-neuron level might not scale up to yield anticorrelation at the level of the BOLD response; also, fMRI pattern similarity values can be strongly affected by preprocessing choices – so a negative pattern similarity value does not necessarily reflect anticorrelation at the individual-neuron level. A final caveat is that, while we predict that differentiation will show up as anticorrelation in the brain region that gives rise to the differentiation effect, this might not translate into anticorrelation in areas that are downstream of this region (e.g., if the hippocampus is the source of the differentiation effect, we would expect anticorrelation there, but not necessarily in neocortical regions that receive input from the hippocampus; we revisit this point later in the discussion, when we address limitations and open questions).”

      We added new text in the Discussion, under “Limitations and Open Questions” (p. 31):

      “Importantly, while hippocampus can boost the representation of unique features in neocortex, we expect that neocortex will continue to represent shared perceptual features (e.g., in Favila et al., 2016, the fact that both pairmates are photos of barns). For this reason, in paradigms like the one used by Favila et al. (2016), the predicted effect of hippocampal differentiation on neocortical representations will be a reduction in pattern similarity (due to upregulation in the representation of unique pairmate features) but neocortex should not cross over into anticorrelation in these paradigms (due to its continued representation of shared perceptual features). Indeed, this is exactly the pattern that Wanjia et al. (2021) observed in their study, which used similar stimuli to those used in Favila et al. (2016).”

      Lastly, we updated the Abstract (p. 1)

      “What determines when neural representations of memories move together (integrate) or apart (differentiate)? Classic supervised learning models posit that, when two stimuli predict similar outcomes, their representations should integrate. However, these models have recently been challenged by studies showing that pairing two stimuli with a shared associate can sometimes cause differentiation, depending on the parameters of the study and the brain region being examined. Here, we provide a purely unsupervised neural network model that can explain these and other related findings. The model can exhibit integration or differentiation depending on the amount of activity allowed to spread to competitors – inactive memories are not modified, connections to moderately active competitors are weakened (leading to differentiation), and connections to highly active competitors are strengthened (leading to integration). The model also makes several novel predictions – most importantly, that when differentiation occurs as a result of this unsupervised learning mechanism, it will be rapid and asymmetric, and it will give rise to anticorrelated representations in the region of the brain that is the source of the differentiation. Overall, these modeling results provide a computational explanation for a diverse set of seemingly contradictory empirical findings in the memory literature, as well as new insights into the dynamics at play during learning.”

      (3) For the modeling of the Favila et al. study, the authors state that a high learning rate is required for differentiation of the same-face pairs. This made me wonder what happens in the low learning rate simulations. Does integration occur?

      For the same-face condition of the Favila simulation, lowering learning rate does not result in an overall integration effect:

      Author response image 1.

      In other cases, we do see integration emerge at lower learning rates – e.g., in the Schlichting interleaved condition we see a small integration effect emerge for a learning rate value of 0.3:

      Author response image 2.

      Our view is that, while integration can emerge at low learning rates, it is not a reliable property of the model – in some cases, there is a “window” of learning rates where there is enough learning to drive integration but not enough to drive differentiation, and in other cases there is not. Given this lack of reliability across simulations, we would prefer not to discuss this in the paper.

      This paradigm has a lot of overlap with acquired equivalence, and so I am thinking about whether these are the sorts of small differences (e.g., same-category scenes and perhaps a high learning rate) that bias the system to differentiate instead of integrate.

      We agree that it would be very interesting to use the model to explore acquired equivalence and related phenomena, but we think it is out of scope of the current paper. We have added some text to the Discussion under “Limitations and Open Questions” (p. 32):

      “Another important future direction is to apply the model to a wider range of learning phenomena involving representational change – for example, acquired equivalence, which (like some of the studies modeled here) involves linking distinct stimuli to a shared associate (see, e.g., Honey and Hall, 1989; Shohamy and Wagner, 2008; Myers et al., 2003; Meeter et al., 2009; de Araujo Sanchez and Zeithamova, 2023). It is possible that some of these phenomena might be better explained by supervised learning, or a mixture of unsupervised and supervised learning, than by unsupervised learning alone.”

      (4) For the simulations of the Schlichting et al. study, the A and B appear to have overlap in the hidden layer based on Figure 9, despite there being no similarity between the A and B items in the study (in contrast to Favila et al., in which they were similar kinds of scenes, and Chanales et al., in which they were similar colors). Why was this decision made? Do the effects depend on some overlap within the hidden layer? (This doesn't seem to be explained in the paper that I saw though, so maybe just it's a visualization error?)

      Overlap in the pretrained hidden representations of A and B is not strictly necessary for these effects – it would be possible to reconfigure other parameters to get high levels of competition even if there were no overlap (e.g., by upregulating the strengths of connections from shared input features). Having said that, it is definitely true that overlap between the pretrained hidden representations boosts competition, and we think it is justified to posit this in the Schlichting simulation. We have now added an explanation for this in the paper (p. 23):

      “New text in Schlichting, “Knowledge Built into the Network”

      Matching the previous two simulations, we pretrained the weights so the hidden representations of the stimuli initially had 2/6 units in common. Even though the A and B stimuli used in the actual experiment did not have obvious feature overlap (they were randomly selected novel objects), it is important to note that the hidden layer is not simply a representation of the sensory features of the A and B stimuli; the hidden layer also receives input from the output layer, which represents the shared associate of A and B (X). We think that the presence of this shared associate justifies our use of initially-overlapping hidden representations.”

      (5) It seems as though there were no conditions under which the simulations produced differentiation in both the blocked and intermixed conditions, which Schlichting et al. observed in many regions (as the present authors note). Is there any way to reconcile this difference?

      We thank the reviewer for bringing this up. If we set the connection strength between X (in the output layer) and A (in the hidden layer) in the blocked condition to .9 instead of .999 (keeping this connection strength at .8 for the interleaved condition) and we set Osc to .0615, we observe differentiation in both conditions.

      Rather than replacing the original results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 10 - Supplement 1), which is included on p. 46.

      We also added the following to the Results section of the Schlichting simulation in the main text (p. 26):

      “Figure 10 - Supplement 1 shows results from an alternative parameterization where, in the low-oscillation-amplitude condition, differentiation is observed in both the blocked and interleaved conditions (mirroring results from Schlichting et al., 2015, who found differentiation in both conditions in several regions of interest, including parts of the hippocampus and medial prefrontal cortex).”

      (6) A general question about differentiation/repulsion and how it affects the hidden layer representation in the model: Is it the case that the representation is actually "shifted" or repelled over so it is no longer overlapping? Or do the shared connections just get pruned, such that the item that has more "movement" in representational space is represented by fewer units on the hidden layer (i.e., is reduced in size)? I think, if I understand correctly, that whether it gets shifted vs. reduce would depend on the strength of connections along the hidden layer, which would in turn depend on whether it represents some meaningful continuous dimension (like color) or not. But, if the connections within the hidden layer are relatively weak and it is the case that representations become reduced in size, would there be any anticipated consequences of this (e.g., cognitively/behaviorally)?

      The representations are shifted – this is discussed in the Chanales results section:

      “Because the activity ``set point'' for the hidden layer (determined by the kWTA algorithm) involves having 6 units active, and the unique parts of the competitor only take up 4 of these 6 units, this leaves room for activity to spread to additional units. Given the topographic projections in the output layer, the model is biased to ``pick up'' units that are adjacent in color space to the currently active units; because activity cannot flow easily from the competitor back to the target (as a result of the aforementioned severing of connections), it flows instead {\em away} from the target, activating two additional units, which are then incorporated into the competitor representation. This sequence of events (first a severing of the shared units, then a shift away from the target) completes the process of neural differentiation, and is what leads to the behavioral repulsion effect in color recall (because the center-of-mass of the color representation has now shifted away from the target).”

      Reviewer #2 (Public Review):

      This paper addresses an important computational problem in learning and memory. Why do related memory representations sometimes become more similar to each other (integration) and sometimes more distinct (differentiation)? Classic supervised learning models predict that shared associations should cause memories to integrate, but these models have recently been challenged by empirical data showing that shared associations can sometimes cause differentiation. The authors have previously proposed that unsupervised learning may account for these unintuitive data. Here, they follow up on this idea by actually implementing an unsupervised neural network model that updates the connections between memories based on the amount of coactivity between them. The goal of the authors' paper is to assess whether such a model can account for recent empirical data at odds with supervised learning accounts. For each empirical finding they wish to explain, the authors built a neural network model with a very simple architecture (two inputs layers, one hidden layer, and one output layer) and with prewired stimulus representations and associations. On each trial, a stimulus is presented to the model, and inhibitory oscillations allow competing memories to pop up. Pre-specified u-shaped learning rules are used to update the weights in the model, such that low coactivity leaves model connections unchanged, moderate coactivity weakens connections, and high coactivity strengthens connections. In each of the three models, the authors manipulate stimulus similarity (following Chanales et al), shared vs distinct associations (following Favila et al), or learning strength (a stand in for blocked versus interleaved learning schedule; following Schlichting et al) and evaluate how the model representations evolve over trials.

      As a proof of principle, the authors succeed in demonstrating that unsupervised learning with a

      simple u-shaped rule can produce qualitative results in line with the empirical reports. For instance, they show that pairing two stimuli with a common associate (as in Favila et al) can lead to *differentiation* of the model representations. Demonstrating these effects isn't trivial and a formal modeling framework for doing so is a valuable contribution. Overall, the authors do a good job of both formally describing their model and giving readers a high level sense of how their critical model components work, though there are some places where the robustness of the model to different parameter choices is unclear. In some cases, the authors are very clear about this (e.g. the fast learning rate required to observe differentiation). However, in other instances, the paper would be strengthened by a clearer reporting of the critical parameter ranges.

      We thank the reviewer for raising this point. The interdependence of parameters in our model makes it infeasible to identify critical parameter ranges. We have added a paragraph to the “Approach to Parameterization and Data Fitting” section in the Methods to address this point (p. 33):

      “The overall goal of this modeling work is to account for key empirical regularities regarding differentiation and integration and to establish boundary conditions on these regularities. As such, the modeling work described below focuses more on qualitative fits to general properties of the data space than on quantitative fits to results from specific studies. Automatic parameter optimization is not feasible for this kind of model, given the large number of model parameters and the highly interactive, nonlinear nature of competitive dynamics in the model; consequently, model fitting was done by hand.

      These complex interactions between parameters also make it infeasible to list “critical parameter ranges” for generating particular model outcomes. Our experience in working with the model has been that activation dynamics are what matter most for learning, and that disparate parameter sets can give rise to the same activation dynamics and -- through this -- the same learning effects; likewise, similar parameter sets can give rise to different activation dynamics and different learning outcomes. Consequently, in this paper we have focused on characterizing the dynamics that give rise to different learning effects (and how they can be affected by local parameter perturbations, e.g., relating to learning rate and oscillation size), rather than the – impossible, we believe – task of enumerating the full set of parameter configurations that give rise to a particular result.”

      For instance, it's clear from the manipulation of oscillation strength in the model of Schlichting et al that this parameter can dramatically change the direction of the results. The authors do report the oscillation strength parameter values that they used in the other two models, but it is not clear how sensitive these models are to small changes in this value.

      In some cases, the effects of oscillation strength are relatively smooth. For example, in the Favila simulation, increasing the oscillation amplitude Osc effectively recapitulates the U-shaped curve (i.e., higher levels of Osc lead to more competitor activation, which initially leads to weakening / differentiation but then gives way to strengthening / integration), as is shown for the Favila Different Face condition in this plot:

      Author response image 3.

      In the Chanales 2/6 overlap condition, the effects of varying Osc are more nonlinear:

      Author response image 4.

      We think this is attributable to the increased “all-or-none” recurrent dynamics in this simulation (due to the recurrent projections within the output layer), which make it more difficult to evoke moderate (vs. high) levels of activation. This difficulty in reliably obtaining graded activation dynamics is likely a consequence of the small-scale (“toy”) nature of the model and the simple inhibitory mechanisms employed here, as opposed to being a generalizable property of the brain – presumably, the actual brain employs more nuanced and effective means of controlling activation. Furthermore, we don’t think that the high prevalence of integration in the model’s parameter space necessarily translates into a prediction that integration should be more prevalent overall – see the new “Reconciling the Prevalence of Differentiation in the Model and in the Data” section described in response to one of the reviewer’s other points below. Due to the paper already being quite long, we have opted not to include the above plots / discussion in the paper.

      Similarly, it's not clear whether the 2/6 hidden layer overlap (only explicitly manipulated in the model of Chanales et al) is required for the other two models to work.

      When we were parameterizing the model, we opted to keep the 2/6 level of overlap for all of the simulations and we adjusted other parameters to fit the data; in part, this was because overlap can only be adjusted in discrete jumps, whereas other influential parameters in the model can be adjusted in a more graded, real-valued way. Our use of 2/6 overlap (as opposed to, say, 1/6 or 3/6 overlap) for the Favila and Schlichting models was done out of convenience, and should not be interpreted as a strong statement that this particular level of overlap is necessary for obtaining differentiation; we could easily get the model to show differentiation given other overlap levels by adjusting other parameters.

      Finally, though the u-shaped learning rule is essential to this framework, the paper does little formal investigation of this learning rule. It seems obvious that allowing the u-shape to collapse too much toward a horizontal line would reduce the model's ability to account for empirical results, but there may be other more interesting features of the learning rule parameterization that are essential for the model to function properly.

      Given that the paper is already quite long, we have opted not to include further exploration of the parameters of the U-shaped learning rule in the paper. However, for the reviewer’s information, we report the effects of a few illustrative manipulations of these parameters below. As a general principle, the effects of these manipulations make sense in light of the theoretical framework described in the paper.

      For example, the parameter “DRevMag” controls the size of the negative “dip” in the U-shaped curve (more negative values = a larger dip). Given that this negative dip is essential for severing weights to competitors and causing differentiation, shifting DRevMag upwards towards zero should shift the balance of the model away from differentiation and towards integration. This is indeed what we observe, as shown in this parameter sweep from the Chanales simulation:

      Author response image 5.

      As another example: The “DRev” parameter controls where the U-shaped curve transitions from negative weight change to positive weight change. Lower values of DRev mean that the region of coactivity values leading to negative weight change will be smaller, and the region of coactivity values leading to positive weight change will be larger. As such, we would expect that lower values of DRev would bias the model toward integration. That is indeed the case, as shown in this parameter sweep from the Schlichting Blocked simulation:

      Author response image 6.

      There are a few other points that may limit the model's ability to clearly map onto or make predictions about empirical data. The model(s) seems very keen to integrate and do so more completely than the available empirical data suggest. For instance, there is a complete collapse of representations in half of the simulations in the Chanales et al model and the blocked simulation in the Schlichting et al model also seems to produce nearly complete integration Even if the Chanales et al paper had observed some modest behavioral attraction effects, this model would seem to over-predict integration. The author's somewhat implicitly acknowledge this when they discuss the difficulty of producing differentiation ("Practical Advice for Getting the Model to Show Differentiation") and not of producing integration, but don't address it head on.

      We thank the reviewer for this comment – R1 had a similar comment. We have added a new section to the Discussion to address this point (p. 30):

      “Reconciling the Prevalence of Differentiation in the Model and in the Data.

      A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      Second, the authors choice of strongly prewiring associations in the Chanales and Favila models makes it difficult to think about how their model maps onto experimental contexts where competition is presumably occurring while associations are only weakly learned. In the Chanales et al paper, for example, the object-face associations are not well learned in initial rounds of the color memory test. While the authors do justify their modeling choice and their reasons have merit, the manipulation of AX association strength in the Schlichting et al model also makes it clear that the association strength has a substantial effect on the model output. Given the effect of this manipulation, more clarity around this assumption for the other two models is needed.

      We thank the reviewer for bringing this up. We have edited the section entitled “A Note on Prewiring Representations” in the Methods to further justify our choice to prewire associations in the Chanales and Favila models (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Overall, this is strong and clearly described work that is likely to have a positive impact on computational and empirical work in learning and memory. While the authors have written about some of the ideas discussed in this paper previously, a fully implemented and openly available model is a clear advance that will benefit the field. It is not easy to translate a high-level description of a learning rule into a model that actually runs and behaves as expected. The fact that the authors have made all their code available makes it likely that other researchers will extend the model in numerous interesting ways, many of which the authors have discussed and highlighted in their paper.

      Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pairmate. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      We thank the reviewer for this comment. In the revised manuscript, we have edited the “Note on Prewiring Representations” in the Methods to clarify how our assumptions about prewiring relate to what we really think is happening in the brain (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      We thank the reviewer for mentioning this. We have added a paragraph to the “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” section of the Discussion that addresses this point (pp. 28-29):

      “Although the results from Wanjia et al. (2021) provide strong support for the model's prediction that differentiation will be abrupt, they raise another question: What explains variance across items in when this abrupt change takes place? The answer to this question remains to be seen, but one possibility is encoding variability: If we assume that participants stochastically sample (i.e., attend to) the features of the scene pairmates, it is possible that participants might initially fail to sample the features that distinguish the scene pairmates, which can be quite subtle – and if the distinguishing features of the pairmates are not represented in high-level visual regions (i.e., the pairmates are represented in these regions as having the same features), this could delay the onset of differentiation until the point at which the distinguishing features happen (by chance) to be sampled.”

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      We now address this point in the section of the Discussion entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” (p. 27):

      “Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus.”

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      We thank the reviewer for this comment. We have made two major changes to address this concern. First, we have edited the “Learning” section within “Basic Network Properties” in the main text (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      Second, we have added the requested equations to the “Learning” part of the Methods (pp. 37-38):

      The right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

      We thank the reviewer for mentioning this. We have added a new paragraph in the “Learning” section of the Methods to justify the particular shape of the learning curve (pp. 38-39):

      “Evidence for the U-shaped plasticity function used here (where low activation leads to no change, moderate activation leads to weakening, and higher levels of activation lead to strengthening) was previously reviewed in Ritvo et al. (2019). In brief, there are three lines of work that support the U shape: First, multiple neurophysiological studies have found that moderate postsynaptic depolarization leads to synaptic weakening and higher levels of depolarization lead to synaptic strengthening (e.g., Artola et al., 1990; Hansel et al., 1996). Second, human neuroscience studies have used pattern classifiers, applied to fMRI and EEG data, to measure memory activation, and have related this measure to subsequent memory accessibility; several studies using this approach have found that low levels of activation lead to no change in memory strength, moderate levels of activation lead to impaired subsequent memory, and higher levels of activation lead to increased subsequent memory (e.g., Newman and Norman, 2010; Detre et al., 2013; Kim et al., 2014; for related findings, see Lewis-Peacock and Norman, 2014; Wang et al., 2019). Third, a recent human fMRI study by Wammes et al. (2022) manipulated memory activation by varying the visual similarity of pairmates and observed a U-shaped function relating visual similarity to representational change in the hippocampus, whereby low levels of pairmate similarity were associated with no change, moderate levels of similarity were associated with differentiation, and the differentiation effect went away at higher levels of similarity.

      We have also included a pointer to this new paragraph in the “Nonmonotonic Plasticity Hypothesis” section of Introduction (p. 2):

      (for further discussion of the empirical justification for the NMPH, see the Learning subsection in the Methods)”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few additional minor things about data presentation and the like:

      (1) Figure 1 legend - a more general description of how to interpret the figure might be helpful for more naive readers (e.g., explaining how one can visualize in the schematic that there is overlap in the hidden layer between A and B). Also, from the Figure 1 depiction, it's not clear what is different about the setup from the initial left hand side panels in A, B, C, to make it such that activity spreads strongly to A in panel A, weakly in panel B, and not at all in panel C since the weights are the same. Is there a way to incorporate this into the graphic, or describe it in words?

      To address this point, we have added the following text to the Figure 1 caption (p. 3):

      “Note that the figure illustrates the consequences of differences in competitor activation for learning, without explaining why these differences would arise. For discussion of circumstances that could lead to varying levels of competitor activation, see the simulations described in the text.”

      (2) I believe not all of the papers cited on lines 193-195 actually have similarity manipulations in them. I'd recommend double checking this list and removing those less relevant to the statement.

      Thank you for pointing this out; we have removed the Ballard reference and we have clarified what we mean by similarity reversal (p. 7):

      “The study was inspired by recent neuroimaging studies showing ``similarity reversals'', wherein stimuli that have more features in common (or share a common associate) show less hippocampal pattern similarity (Favila et al., 2016; Schlichting et al., 2015; Molitor et al., 2021; Chanales et al., 2017; Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Zeithamova et al., 2018; Jiang et al., 2020; Wammes et al., 2022).”

      (3) I wanted a bit more detail about how the parameters were set in the main paper, not just in the methods. Even something as brief as noting that model fitting was done by hand by tweaking parameters to re-create the empirical patterns (if I'm understanding correctly) would have been helpful for me.

      To address this point, we have added the following text under “Basic Network Properties” (p. 4):

      “Our goal was to qualitatively fit key patterns of results from each of the aforementioned studies. We fit the parameters of the model by hand as they are highly interdependent (see the Methods section for more details).”

      (4) In Figure 4E, it would be helpful to describe the x and y axes of the MDS plots in the legend.

      To address this point, we have added the following new text to the Figure 4 caption that clarifies how the MDS plots were generated (p. 11):

      “MDS plots were rotated, shifted, and scaled such that pairmate 1before is located at (0,0), pairmate 2before is located directly to the right of pairmate 1before, and the distance between pairmate 1before and pairmate 2before is proportional to the baseline distance between the pairmates.”

      (5) Figure 6 - at first I thought the thicker line was some sort of baseline, but I think it is just many traces on top of one another. If other readers may be similarly confused, perhaps this could be stated.

      Thanks for this comment. We have updated Figure 6 (p. 16).

      We have also updated the caption.

      I am having a lot of difficulty understanding the terms "competitor-to-competitor,"

      "competitor-to-target/shared," and "target/shared-to-target/shared," and therefore I don't fully get Figure 5. I think it might be helpful to expand the description of these terms where they are first introduced in the paper (p. 13?). I think I am missing something crucial here, and I am not quite sure what that is-which I know is not very helpful! But, to narrate my confusion a bit, I thought that these terms would somehow relate to connections between different connections of the network. For example is competitor-to-competitor within the hidden layer? Or is this somehow combining across relevant connections that might span different pairs of layers in the model? And, I really have no idea why it is "target/shared."

      Thank you for these comments. We have updated Figure 5 and we have also made several changes to the main text and the figure caption to address these points.

      Changes to the main text (p. 13):

      “Whether symmetric or asymmetric integration occurs depends on the relative strengths of connections between pairs of unique competitor units (competitor-competitor connections) compared to connections between unique competitor units and shared units (competitor-shared connections) after the first trial (Figure 5; note that the figure focuses on connections between hidden units, but the principle also applies to connections that span across layers). Generally, coactivity between unique competitor units (competitor-competitor coactivity) is less than coactivity between unique competitor units and shared units (competitor-shared coactivity), which is less than coactivity between unique target units and shared units (target-shared coactivity).”

      (7) Relatedly in Figure 13, I understand how some competitor-to-target/shared connections could be spared in the bottom instance given panel B. However, I'm struggling to understand how that relates to the values in the corresponding chart in panel A. What about panel A, bottom (vs. the top) means lower coactivities between some competitor-to-target/shared? Is it because if the noise level is higher, the "true" activation of competitor-to-target/shared connections is weaker? I think again, I'm missing something critical here! and wonder if other readers may be in the same situation. (I know the authors described this also on p. 36, but I'm still confused!)

      We have updated Figure 13 to clarify these points.

      (8)  In Figure 9, I believe there is no caption for panel D. Also, it looks as though the item unit active for A and B is the same. I wonder if this is an error?

      Thank you for catching these errors! They have both been fixed.

      Reviewer #2 (Recommendations For The Authors):

      -Perhaps I missed it, but I think defining coactivity (how it is computed) in the main text would be useful for readers, as this is critical for understanding the model. I did find it in the methods.

      We thank the reviewer for this suggestion. We have updated the “Learning” section within “Basic Network Properties” in the main text to address this point (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      -The modeling results in the different face condition are at odds with the data for the Favila et al model (they observe some differentiation in the paper and the model predicts no change). This could be due to a number of unmodeled factors, but it is perhaps worth noting.

      Thank you for pointing this out. It is possible to better capture the pattern of results observed by Favila et al. in their paper (with some differentiation in the different-face condition and even more differentiation in the same-face condition) by slightly adjusting the model parameters (specifically, by setting the oscillation amplitude Osc for the hidden layer to .1 instead of .067).

      Rather than replacing the old (Osc \= .067) results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 8 - Supplement 1; see p.45):

      We also added new text to the Favila Results, under “Differentiation and Integration” (p. 20):

      “Note also that the exact levels of differentiation that are observed in the different-face and same-face conditions are parameter dependent; for an alternative set of results showing some differentiation in the different-face condition (but still less than is observed in the same-face condition), see Figure 8 - Supplement 1.”

      -Related to my comment in the public review about pre-wiring associations, in the caption for Figure 9 (Schlichting model), the authors report "In both conditions, the pre-wired connection linking the "item B" hidden units to the "item X" output unit is set to .7. In the interleaved condition, the connection linking the "item A" hidden units to the "item X" output unit is set to .8, to reflect some amount of initial AX learning. In the blocked condition, the connection linking the "item A" hidden units to the "item X" output unit is set a higher value (.999), to reflect extra AX learning." What are the equivalent values for the other models, especially the Favila model since the structure is the same as Schlichting? I understood all the "strong" connections to be .99 unless otherwise stated. If that's the case, I don't understand why the blocked Schlichting model and the Favila model produce opposite effects. More clarity would be useful here.

      We have added a new paragraph to the results section for the Schlicting model (under “Differentiation and Integration”) to clarify why the blocked Schlichting model and the Favila model show different results (p. 24):

      “Note that the key feature driving integration in the blocked condition of this simulation is not the high strength of the connection from X to A on its own – rather, it is the asymmetry in the pretrained connection strengths from X to A (.999) and from X to B (.7). This asymmetry, which is meant to reflect the extensive training on A-X that occurred before the initial presentation of B-X, results in the A-X hidden representation decisively winning the competition during B-X presentation, which then leads to the B input also being linked to this representation (i.e., integration). It is instructive to compare this to the same-face condition from our simulation of Favila et al. (2016): In that simulation, the two pairmates are also linked strongly (.99 initial connection strength) to a shared associate, but in that case the connections are equally strong, so there is more balanced competition -- in this case, the competitor representation only comes to mind moderately (instead of displacing the target representation), so the result is differentiation instead of integration.”

      -The meaning of the different colored dots in Figure 5 is bit hard to keep track of, even given the legend labels. The figure might benefit from a model sketch highlighting each of the different coactivity types. The left side of Fig 13 was useful but again somehow mapping on the colors would help further. Another note on these figures: what does having two dots of each color mean? Is it just an illustration of the variance? There would be more dots if there was one dot per coactivity value.

      We have updated Figure 5 and Figure 13 to clarify these points (including a clarification that the dots only represent a subset of the possible pairings between units).

      -While I appreciate the goal of the paper is to account for these three studies, readers who aren't familiar with or specifically interested in these studies may appreciate a small amount of intuition on why formalizing unsupervised learning models may be broadly important for computational investigations of learning/memory/cognition.

      We have added the following text under “Basic Network Properties” in the Introduction to address this point (p. 4):

      “Achieving a better understanding of unsupervised learning is an important goal for computational neuroscience, given that learning agents have vastly more opportunities to learn in an unsupervised fashion than from direct supervision (for additional discussion of this point, see, e.g., Zhuang et al., 2021).”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by Neininger-Castro and colleagues presents a novel automatic image analysis method for assessing sarcomeres, the basic units of myofibrils and validates this tool in a couple of experimental approaches that interfere with sarcomere assembly in iPSCcardiomyocytes (iPSC-CM).

      Automatic quantification of sarcomeres is definitely something that is useful to the field. I am surprised that there is no reference in the manuscript to SarcTrack, published by Toepfer and colleagues in 2019 (PMID 30700234), which has exactly the same purpose. The advantage of the image analysis software presented in the current manuscript appears to me to be that it can cover both mature sarcomeres and nascent sarcomeres in premyofibrils effectively.

      We whole-heartedly disagree that SarcTrack has the exact same purpose as sarcApp. sarcApp measures more than the frequency of actinin2 images, and can measure real-space quantifications of actinin, myomesin, and titin, which has not been done before in this way. However, SarcTrack is an interesting method that we hope many researchers find helpful in their research. SarcTrack is a particle tracker that outputs the dimensions of the objects found, but does not distinguish between Z-Lines and other actinin2-positive structures (Z-Bodies, adhesions). It also does not group these structures into higher order structures such as myofibrils and muscle stress fibers.

      When going through the manuscript there were a few issues that should be addressed in a revised version of the manuscript:

      1) I am a bit puzzled that they took 1.4 um length as a cutoff length for a mature A-band in their quantifications, since the consensus in the field for thick filament length seems to be 1.6 um?

      We use 1.4 µm as a cutoff length for the length of a Z-Line rather than the A-Band. We believe the reviewer is referring to the width of the A-Band perpendicular to the Z-lines, which is indeed 1.6 µm. However, we are referring to the length of the Z-Lines, which can span anywhere from 1.4 µm to up to 10 or more µm. Thank you for allowing us to make the clarification.

      2) When doing the knockdown for alpha and beta-myosin heavy chain, respectively, why did they not also do a Western blot for the "other" isoform as well (Figure 7)? We know that iPSCCM express a mixture, so the relatively mild phenotype that they observe in single knockdown experiments may well be due to concomitant upregulation of the expression of the other isoform. In my point of view this should be checked.

      It is likely that in the single knockdown experiments the other isoform is upregulated, which is why we were careful in stating that neither muscle myosin alone is required for sarcomere formation. We do agree this would be an interesting experiment to check beyond the scope of this manuscript.

      3) There seems to be a disconnect between the images for myomesin knockdown shown in Figure 8H and the quantification shown in Figure 8I, which makes me wonder whether the image shown in H middle (MYOM1 (1) KD), where the beta-myosin doublets do not seem to be much affected is really representative?

      The image shown in the middle of H is representative of the mean length of beta-myosin doublets in MYOM1 (1) KD hiCMs. While the beta-myosin doublets are still present and organized, they are significantly shorter. In the zoomed out image, you can appreciate much shorter arrays of beta-myosin doublets that, while extending across the entire cell, are thinner than control cells.

      Reviewer #2 (Public Review):

      Neininger-Castro et al report on their original study entitled "Independent regulation of Z-lines and M-lines during sarcomere assembly in cardiac myocytes revealed by the automatic image analysis software sarcApp", In this study, the research team developed two software, yoU-Net and sarcApp, that provide new binarization and sarcomere quantification methods. The authors further utilized human induced pluripotent stem cell-derived cardiomyocytes (hiCMs) as their model to verify their software by staining multiple sarcomeric components with and without the treatment of Blebbistatin, a known myosin II activity inhibitor. With the treatment of different Blebbistatin concentrations, the morphology of sarcomeric proteins was disturbed. These disrupted sarcomeric structures were further quantified using sarcApp and the quantification data supported the phenotype. The authors further investigated the roles of muscle myosins in sarcomere assembly by knocking down MYH6, MYH7, or MYOM in hiCMs. The knockdown of these genes did not affect Z-line assembly yet the knockdown of MYOM affected M-line assembly. The authors demonstrated that different muscle myosins participate in sarcomere assembly in different manners.

      Reviewer #3 (Public Review):

      Neininger-Castro and colleagues developed software tools for the quantification of sarcomeres and sarcomere-precursor features in immunostained human induced pluripotent stem cellderived cardiac myocytes (hiCMs). In the first part they used a deep-learning- based model called a U-Net to construct and train a network for binarization of immunostained cardiomyocyte images. They also wrote graphical user interface (GUI) software that will assist other labs in using this approach and made it publicly available. They did not compare their approach to existing ones, but an example from one image suggests their binarization tool outperforms Otsu thresholding binarization.

      In the second part they developed a software tool called sarcApp that classifies sarcomere structures in the binarized image as a Z-Line or Z-Body and assigns each to either a myofibril or to stress fibers. The tools can then automatically count and measure multiple features (33 per cell and 24 per myofibril) and report them on a per-cell, per-myofibril, and per- stress fiber basis.

      To test the tools they used Blebbistatin to inhibit sarcomere assembly and showed that the sarcApp tool could capture changes in multiple features such as fewer myofibrils, fewer Z-Lines, decreased myofibril persistence, decreased Z-Line length and altered myofibril orientation in the Blebbistatin treated cells. With some changes the tool was also shown to quantify sarcomeres in titin and myomesin stained cardiomyocytes.

      Finally they used sarcApp to quantify the changes in sarcomere assembly after siRNA mediated knockout of MYH7, MYH7, or MYOM. The analysis indicates that neither MYH6 nor MYH7 knockdown perturbed the assembly of Z- or M-lines, and that knockdown of MYOM perturbed the A-band/M-Line but not the Z-Line assembly according to features captured by the sarcApp tool.

      Overall the authors developed and made publicly available an excellent software tool that will be very useful for labs that are interested in studying sarcomere assembly. Multiple features that are difficult to measure or count manually can be automatically measured by the software quickly and accurately.

      There are however some remaining questions about these tools:

      1) The binarization tool which is tailored to sarcomere image binarization appears promising but was not systematically compared with existing approaches.

      We compared it with the existing approach we used previously in the lab, which was Otsu’s method for binarization. We are not aware of several other binarization approaches to compare to, other than using other machine learning techniques that are less advanced than a U-Net, the current standard in image-to-image translation.

      2) How robust is the tool? The tool was tested on images from one type of cardiomyocytes (hiCMs) taken from one lab using Nikon Spinning Disk confocal microscope equipped with Apo TIRF Oil 100X 1.49 NA objective or instant Structured Illumination Microscopy (iSIM), using deconvolution (Microvolution software) and in a specific magnification. It remains to be seen whether the tool would be equally effective with images taken with other microscopy systems, with other cardiomyocytes (chick or neonatal rat), with different magnifications, live imaging, etc.

      We tested the software with several magnifications, with live imaging, and with other tissues. We did not include the information in the manuscript because the data we tested the software with is for future manuscripts studying different aspects of sarcomere formation and maintenance. sarcApp reliably identifies Z-Lines and sarcomeres with deconvolved widefield fluorescence images of hiCMs and frozen human tissue, and are currently using it to measure zebrafish data for another study. Further, it works for live imaging with an actinin2-GFP (or similar) label. For the titin quantification, we would recommend using only 60-100X magnification, as the titin structures (doublets and rings) are not resolvable at lower magnifications.

      3) The tool was developed for evaluation of sarcomere assembly. The authors show that for this application it can detect the perturbation by Blebbistatin, or knockdown of sarcomeric genes. It remains to be seen if this tool is also useful for assessment of sarcomere structure for other questions beside sarcomere assembly and in other sarcomere pathologies.

      While this is beyond the scope of this specific methods paper, we welcome other researchers to use our software for other questions in other pathologies. We are currently doing the same for other manuscripts from our lab.

      Reviewer #1 (Recommendations For The Authors):

      1)"alpha-actinin..., which border the sarcomeric contractile machinery (thin and thick filaments); Z-lines do NOT border thick filaments in a relaxed sarcomere

      We have removed “(thin and thick filaments)” from the text.

      2) myomesin targeting siRNAs (gene name MYOM): there are actually three genes encoding for myomesin family members, specify, which one was targeted (I am assuming MYOM1).

      Thank you for the clarification: we do target MYOM1

      3) I am not surprised that they found not many mature Z-lines in the absence of both sarcomeric myosins; a similar codependence of assembly of mature Z-discs and the presence of functional thick filaments was previously shown by Geach and colleagues in 2015 (PMID 25845369)

      Thank you for sharing this manuscript: we have added a reference to it in our study.

      Reviewer #2 (Recommendations For The Authors):

      This work offers the possibility to gain more insights into the process of sarcomere assembly through the advancement in sarcomeric or myofibril structure analyses. However, some clarifications are needed from the authors, please see below for the comments.

      1) It is recommended that the authors include the time points for replating and harvesting hiCMs. After replating, the cardiomyocytes require at least three to four days for sarcomeric structures to reform. If the hiCMs were fixed before sarcomere assembly had completed, the staining of sarcomeric proteins including ACTN2 and titin could be compromised and it is difficult to tell if the phenotypes observed were consequences of drug treatments or knockdown of sarcomeric genes or simply because the replating hiCMs were fixed before their sarcomeric structures had fully regrown. It is also recommended that the authors replate hiCMs at a fixed time point to avoid discrepancies in the data.

      Cardiomyocytes do not require three to four days for sarcomeric structures to re-form, and indeed only require 24 hours, with the first sarcomeres typically appearing at ~6 hours. We and others have published several studies demonstrating this (Fenix et al., eLIfe 2018, Taneja, Neininger and Burnette MBoC 2020, Chen et al. Nature Methods, 2022). While sarcomeres continue to develop and turn over after this time, our lab is interested in the beginning steps of sarcomerogenesis rather than the turnover of mature structures.

      2) The sarcApp automatically identifies Z-lines and Z-bodies; however, is there an option for the users to set their own thresholds? Some users may select different criterions when quantifying sarcomeres. Moreover, the Z-lines and Z-bodies identified by the software are not always accurate. Can the users modify the list manually in an unbiased way. If this function is not available, the authors may consider adding this function to their software. sarcApp measures Zline and Z-bodies length but does not measure Z-line and Z-bodies width, but sometimes it is also necessary to measure the width.

      Absolutely, users can modify the thresholds to identify Z-Lines and Z-Bodies. There is not a way for users to modify the list in an unbiased way per se, as editing the list of Z-Lines and Z-Bodies based on non-mathematical measurements is inherently biased, but the user is free to add in other Z-Lines and Z-Bodies as they wish. In this context, “manually” and “unbiased” is mutually exclusive.

      3) It is recommended that the authors include the original images beside the sarcomeric structures identified by sarcApp (Figure 2A, 2C, 4C-F and more). It would be easier to compare the original Z-lines and Z-bodies with those identified by the software.

      We have added these in Author response image 1.

      Author response image 1.

      Uncropped images and merges from Figures 2, 4 and 6, respectively.

      4) The M-line length quantification data in Figure 3G, 5F, and 6H showed different colored-dots labeling n1 to n3, but the authors did not discuss the significance of these symbols.

      We are not sure what the reviewer means by this statement: there is no significance of the different colored dots other than to mark the biological replicate shown. These graphs were created using SuperPlots, which was not stated in the original methods. It has now been added to the Statistical Analysis section.

      5) Can the authors elaborate more on the reasons why they treated Blebbistatin at concentrations of 50µM and 100µM. Previous studies showed that 25µM of Blebbistatin was sufficient to delay the transformation of cardiomyocytes (PMID 27072942). Can the authors also comment on why they selected 6 hours, 12 hours, and 24 hours post replating for drug treatment. Moreover, the drug treatment at different time points was only done on ACTN2 but not titin or myomesin.

      We selected 6, 12, and 24 hours for actinin2 to show the time course of sarcomere formation and to show that sarcomeres are developed by 24 hours, as also mentioned above. We are interested in future studies of the time course of titin and myomesin over time, and are working on it in the lab.

      We chose 50 and 100 µM Blebbistatin as these completely blocked sarcomere assembly whereas treatment with 25 µM did not. This manuscript is a methods paper that aims to validate sarcApp and show how it could be used. We did not intend for it to be a comprehensive study of how different concentrations of blebbistatin affects sarcomere assembly.

      We are also unsure what the reviewer means by “transformation of cardiomyocytes”. The manuscript with the PMID of 27072942 does not address this issue. The paper is a “review and analyze readmission data for patients who received a continuous flow left ventricular assist device (LVAD)”. We assume the reviewer is referring to differentiation. The model system we developed and published in eLife in 2018 does not use differentiating iPSC cardiac myocytes. The hiCMs we use are terminally differentiated but still immature, as they are more transcriptionally similar to primary fetal myocytes. As such, they do not maintain their sarcomeres when they removed from the 96 well and plated onto a glass coverslip for highresolution microscopy. These assemble sarcomeres within 24 hours with the sarcomeres forming close to the dorsal membrane and then rearrange overtime (e.g., moving from the top of the cell to the bottom) (Fenix et al., eLife 2018). With that said, we do agree with the reviewer that a study of sarcomere assembly in the context of cardiac myocyte differentiation would be a fascinating direction for future studies, and we think sarcApp could facilitate such studies.

      6) The authors mentioned that the myofibrils of Z-line, titin, and M-line were randomly oriented after Blebbistatin treatments. The myofibrils were randomly oriented for titin and M-line. However, the orientation of Z-line after 50µM Blebbistatin treatment was not necessarily random, only the orientation after 100µM Blebbistatin treatment was randomized. The authors might consider changing bar graph to other types of charts if the orientation was really randomized after quantification.

      We find that the bar chart is the most informative to us, but users can consider other types of charts in their analyses.

      7) It is recommended that the authors include images staining ACTN2 at lower magnifications (Figure 1A, 1C). With current images, it is true that yoU-Net can separate Z-lines from Z-bodies yet it is difficult to tell if yoU-Net can still distinguish Z-lines from Z-bodies with larger images or it only applies to a small portion of the image.

      The yoU-Net can distinguish Z-Lines from Z-Bodies with images of any size, as image size (height vs. width in pixels) does not affect how binarization occurs. During binarization, the only pixel requirement is that the width and height are divisible by 8 (for downsampling purposes). Usually this is not the case with raw images, so the image borders are slightly cropped to make them usable. In terms of resolution, we recommend using 60X-100X objectives on confocal or superresolution data for the clearest results. We have, however, successfully binarized deconvolved widefield images at 100X as well.

      8) The authors mentioned that the knockdown of MYH7 did not affect Z-lines and M-lines; however, the structures of ACTN2, myomesin, and titin appeared more organized as compared to those in control.

      We agree that the sarcomeres and myofibrils look slightly more organized, and did mean to state that the knockdown did not negatively affect Z-Lines and M-Lines and have updated the manuscript to be more accurate.

      9) Please provide the merge images for Fig. 4D, 4E, 6B

      The merge images for Fig. 4D, 4E, and 6B are included with the original images requested above (point 3)

      10) In the text, they described" "antibodies to the titin I-band localize to both MSFs and sarcomeres in hiCMs (Figure 4A). Titin forms ring-like structures around the Z-Bodies of MSFs that are closer to the apparent sarcomere transition point (Figure 4A)" However, based on the antibody information they provided, it is not explicitly recognized for N-or C-terminus TITIN. Please provide TTN N-terminus or TTN-C terminus co-stainings with ACTN2 antibody to understand which part of TTN together with ACTN2 forms a Z-Body.

      The TTN antibody is an N-terminal antibody localizing to the I-Band region of sarcomeres. We agree with the reviewer that a more thorough study of titin will be of interest and we are currently undertaking such a study. However, this is a methods paper presenting a tool. While some of the data we present does point to mechanistic hypotheses, it is beyond the scope of this study to fully characterize titin during sarcomere assembly.

      11) TITIN doublet was used to indicate a sarcomere in Fig. 4C-D. Moreover, they also used another combination (myomesin and F-ACTIN) to label a sarcomere in Fig. 6D. Can they compare the difference between these two methods or by using these two methods (TITIN doublet) and (myomesin and F-ACTIN), how is the average length of sarcomere? Will the sarcomere length be the same?

      We noted in the manuscript that due to the organization of titin doublets (wrapping around the ends of Z-Lines) that the average titin doublet will be approximately 0.3 um longer than the ZLine. We did not expect to see a difference in lengths of myomesin M-Lines and mature actinin2 Z-Lines and indeed do not see major differences in the average lengths (between 2.0 and 2.5 um in 24 hour control cells)

      12) They used siRNA method to knockdown MYH6, MYH7 and MYOM and concluded that the knockdown of these genes did not affect the Z-line assembly. Even though they showed very nice knockdown efficiency of these proteins, they should (1) co-stain MYH6/TITIN/actinin2 and MYH6/ myomesin /actinin2 for Fig. 7C. (2) MYH7/TITIN/actinin2 and MYH7/ myomesin /actinin2 for Fig. 7I. (3) MYOM1/TITIN/actinin2 and MYOM2/TITIN/actinin2 for Fig. 8A. (4) MYH7/MYOM1 and MYH7/MYOM2 for Fig. 8H to make sure the cells they measured were truly knockdownpositive cells,

      The antibodies for alpha and beta myosin are not very efficient for immunofluorescence, and work best for western blots. We decided also to choose a random subset of the cells on the dish to be sure to eliminate any risk of cherry-picking. While imaging cells on the dish, we looked only at the DAPI nuclear channel and selected 50 cells minimum per dish with only this channel, then imaged the other channels.

      Minor comments:

      1) Well-organized sarcomere structure on DMSO treated cells in Fig.5A and Fig. 6A, but it was disarray in Fig. S3M. Why?

      Figure S3 shows hiCMs that have only been allowed to spread for 6 hours, which have not formed mature sarcomeres yet, hence the disarray.

      2) Fig 1A, Fig2B: please label the name of the antibody, not the actin filament

      We used phalloidin labelling here, which marks actin filaments. We have updated the figure legends to be more clear. Thank you!

      3) Fig. 7I: actinin2 instead of actinin

      Thank you for catching this! We have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      Testing the app using images shot by other microscopy systems, magnifications, and cardiomyocytes from other species, as noted in the public review above, should make the app even more wildly useful.

      A more formal head-to-head comparison with other approaches will be more convincing in showing the new tool is superior

      I also think that a more detailed protocol for using the app will help other investigators.

      The app counts and measures many features, but it is not always clear how and using what algorithm these are measured. Including these details in a protocol or even as comments in the code will be very helpful for others.

      The protocol found on the public GitHub for the app will help other investigators to download, use, and understand the application. We have received contact from researchers who have been able to use the application without assistance from us, which is a good sign that the application is user-friendly and that the online protocol is sufficient.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      We have addressed these concerns in the revised version and mentioned these efforts in our previous response letter. We’re briefly mentioning them here again. We attempted measuring HGSNAT catalyzed reaction by monitoring the decrease in acetyl-CoA in the presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA (gray) upon the addition of HGSNAT (red) (Rebuttal figure 1).

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 M acetyl-CoA was measured in presence of 10 M D-glucosamine and 30 nM HGSNAT at pH 7.5.

      While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active. In addition, we have shown by cryo-EM that GFP-tagged HGSNAT that we purified in detergent was already bound to the endogenous substrate ACO, an observation that has been observed by Xu et al., as well. Finally, we performed LC-MS on GFP-tagged HGSNAT purified in detergent to detect bound ACO, which could be further removed by dialysis. These results have been included in Figure S9. The endogenous binding of ACO to HGSNAT in detergent suggests that neither the tag nor detergent are detrimental to the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      We have already changed this figure in our latest submission. Perhaps the changes made were not obvious while reviewing. We agreed with this reviewer that the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. In the absence of data supporting large movements during the acetyl transfer reaction, old Figure 5 appeared speculative. Hence, we have edited Figure 5 in the revised version of the manuscript based on the observations we made in this study, and different states shown in the figure do not show any conformational changes and only depict acetyl transfer.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      In the response to reviewers, the authors mention revised coordinates, but the revised coordinates provided to this reviewer do not reflect the stated changes (I assume a technical error somewhere)

      Perhaps, the old coordinates in the deposition system were resubmitted with the revised draft. Nevertheless, we have made the changes suggested by this reviewer to structure in the previous round and have released the new coordinates (PDB ID: 8TU9).

      Is there any evidence for the interprotomer disulfide except for the map? e.g. if it is a disulfide-linked dimer, one should see a shift in mobility on non-reducing vs reducing SDS-PAGE. Without this, the evidence from the map is not conclusive - while the symmetry-related cysteines are nearby to one another, based on the map I could argue that they could just as well be modeled with the cys sidechains reduced and pointing away from one another.

      In addition to building the density based on cryo-EM maps, we have performed FSEC-based thermal melt analysis of the Ala mutation of C334 that is involved in disulfide at the dimer interface. C334A is still expressed as a dimer, suggesting that C334A is not the only residue stabilizing the dimer. Upon heating the detergent-solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Figure 4-Figure supplement 1 in main manuscript). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer. We have also performed PAGE analysis as suggested by this reviewer and noticed that reducing conditions result in a monomeric protein band (Rebuttal figure 2). While we were revising this manuscript, two other groups published structures of HGSNAT (Xu et al., 2024, Nat. Struct Mol Biol, and Zhao et al., 2024, Nat. Comm). These groups have also identified this disulfide at the dimer interface in their HGSNAT structures. Zhao et al. showed that this disulfide is not crucial for dimerization and also suggested that it can break depending on the conformation of HGSNAT. Our FSEC results agree with this observation.

      Author response image 2.

      Comparison of purified HGSNAT on native and reducing SDS-PAGE. The arrows on both the gels indicate N-GFP-HGSNAT. The two bands on the SDS PAGE are, perhaps, two differentially glycosylated forms of HGSNAT.


      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 3.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors in this paper investigate the nature of the activity in the rodent EPN during a simple freely moving cue-reward association task. Given that primate literature suggests movement coding whereas other primate and rodent studies suggest mainly reward outcome coding in the EPNs, it is important to try to tease apart the two views. Through careful analysis of behavior kinematics, position, and neural activity in the EPNs, the authors reveal an interesting and complex relationship between the EPN and mouse behavior.

      Strengths:

      (1) The authors use a novel freely moving task to study EPN activity, which displays rich movement trajectories and kinematics. Given that previous studies have mostly looked at reward coding during head-fixed behavior, this study adds a valuable dataset to the literature. (2) The neural analysis is rich and thorough. Both single neuron level and population level (i.e. PCA) analysis are employed to reveal what EPN encodes.

      Thank you very much for this appreciation.

      Weaknesses:

      (1) One major weakness in this paper is the way the authors define the EPN neurons. Without a clear method of delineating EPN vs other surrounding regions, it is not convincing enough to call these neurons EPNs solely from looking at the electrode cannula track from Figure 2B. Indeed, EPN is a very small nucleus and previous studies like Stephenson-Jones et al (2016) have used opto-tagging of Vglut2 neurons to precisely label EPN single neurons. Wallace et al (2017) have also shown the existence of SOM and PV-positive neurons in the EPN. By not using transgenic lines and cell-type specific approaches to label these EPN neurons, the authors miss the opportunity to claim that the neurons recorded in this study do indeed come from EPN. The authors should at least consider showing an analysis of neurons slightly above or below EPN and show that these neurons display different waveforms or firing patterns.

      We thank the reviewer for their comment, and we thank the opportunity to expand on the inclusion criteria of studied units after providing an explanation. 

      As part of another study, we performed experiments recording in EPN with optrodes and photoidentification in PV-Cre animals. We found optoidentified units in both: animals with correct placement (within the EPN) and on those with off-target placement (within the thalamus or medial to the EPN). Thus, despite the use of Cre animals, we relied on histology to ensure correct EPN recording. We believe that the optotagging based purely on neural makers such as PV, SOM, VGLUT, VGAT would not provide a better anatomical delineation of the EPN since adjacent structures are rich in those same markers. The thalamic reticular nucleus is just dorsal to the EPN and it has been shown to express both SOM and PV (Martinez-Garcia et al., 2020). 

      On the other hand, the lateral hypothalamus (just medial to the EPN) also expresses vGlut2 and SOM. Stephenson-Jones (2016), Extended Data Figure 1, panel g, shows vGluT2 and somatostatin labeling of neurons, with important expression of neurons dorsal, ventral and medial to the EPN. Thus, we believe that viral strategies relying on single neuronal markers still depend on careful histological analysis of recording sites.

      A combination of neural markers or more complex viral strategies might be more suitable to delineate the EPN. As an example, for anatomical tracing Stephenson-Jones et al. 2016 performed a rabies-virus based approach involving retrogradely transported virus making use of projection sites through two injections. Two step viral approaches were also performed in Wallace, M. et al. 2017. We attempted to perform a two-step viral approach, using an anterogradely transported Cre-expressing virus (AAV1.hSyn.Cre.WPRE.hGH) injected into the striatum and a second Cre dependent ChR2 into the EPN. However, our preliminary experiments showed that this double viral approach had a stark effect decreasing the performance of animals during the task (we attempted re-training 2-3 weeks after viral infections and animals failed to turn to the contralateral side of the injections). We believe that this approach might have had a toxic effect (Zingg et al., 2017). 

      To this point, a recent paper (Lazaridis et al., 2019) repeated an optogenetic experiment performed in the Stephenson-Jones et al. study, using a set of different viral approaches and concluded that increasing the activity of GPi-LHb is not aversive, as it had been previously reported. Thus, future studies attempting to increase anatomical specificity are a must, but they will require using viral approaches amenable to the behavioral paradigm.

      We attempted to find properties regarding waveforms, firing rate, and firing patterns from units above or below, however, we did not find a marker that could generate a clear demarcation. We show here a figure that includes the included units in this study as well as excluded ones to show that there is a clear overlap.

      Author response image 1.

      Finally, we completely agree with the reviewer in that there is still room for improvement. We have further expanded the Methods section to explain better our efforts to include units recorded within the EPN. Further, we have added a paragraph within the Discussion section to point out this limitation (lines 871-876).

      Methods (lines 116-131):

      “Recordings. Movable microwire bundles (16 microwires, 32 micrometers in diameter, held inside a cannula, Innovative Neurophysiology, Durham, NC)] were stereotaxtically implanted just above the entopeduncular nucleus (-0.8 AP, 1.7 ML, 3.9 DV). Post surgical care included antibiotic, analgesic and antiinflammatory pharmacological treatment. After 5 days of recovery, animals were retrained for 1-2 weeks. Unitary activity was recorded for 2-6 days at each dorsoventral electrode position and the session with the best electrophysiological (signal to noise ratio (>2), stability across time) and behavioral [performance, number of trials (>220)] quality was selected. Microwire electrodes were advanced in 50 micrometer dorsoventral steps for 500 micrometers in total. After experiment completion, animals were perfused with a 4% paraformaldehyde solution. Brains were extracted, dehydrated with a 30% sucrose solution and sectioned in a cryostat into 30micron thick slices. Slices were mounted and photographed using a light microscope. Microwire tracks of the 16-microwire bundle were analyzed (Fig. 2A-B) and only animals with tracks traversing the EPN were selected (6 out of 10). Finally, we located the final position of microwire tips and inferred the dorsoventral recording position of each of the recording sessions. Only units recorded within the EPN were included.” 

      Discussion (lines 871-876):

      “A weakness of the current study is the lack of characterization of neuronal subtypes. An area of opportunity for future research could be to perform photo-identification of neuronal subtypes within the EPN which could contribute to the overall description of the information representation. Further, detailed anatomical viral vector strategies could aid to improve anatomical localization of recordings, reduce reliance on histological examination, and solve some current controversies (Lazaridis et al., 2019).” 

      (2) The authors fail to replicate the main finding about EPN neurons which is that they encode outcome in a negative manner. Both Stephenson-Jones et al (2016) and Hong and Hikosaka (2008) show a reward response during the outcome period where firing goes down during reward and up during neutral or aversive outcome. However, Figure 2 G top panel shows that the mean population is higher during correct trials and lower during incorrect trials. This could be interesting given that the authors might try recording from another part of EPN that has not been studied before. However, without convincing evidence that the neurons recorded are from EPN in the first place (point 1), it is hard to interpret these results and reconcile them with previous studies.

      We really thank the reviewer for pointing out that we need to better explain how EPN units encode outcome. We now provide an additional panel in Figure 4, its corresponding text in the results section (lines 544-562) and a new paragraph in the discussion related to this comment.

      We believe that we do indeed recapitulate findings of both of Stephenson-Jones et al (2016) and Hong and Hikosaka (2008). Both studies focus on a specific subpopulation of GPi/EPN neurons that project to the lateral habenula (LHb). Stephenson-Jones et al (2016) posit that GPi-LHb neurons (which they opto-tag as vGluT2) exhibit a decreased firing rate during rewarding outcomes. Hong and Hikosaka (2008) antidromically identified LHb projecting neurons through within the GPi and found reward positive and reward negative neurons, which were respectively modulated either by increasing or decreasing their firing rate with a rewarding outcome (red and green dots on the x-axis of Figure 5A in their paper).

      As the reviewer pointed out the zScore may be misleading. Therefore, in our study we also decomposed population activity on reward axis through dPCA. When marginalizing for reward in Figure 3F, we find that the weights of individual units on this axis are centered around zero, with positive and negative values (Figure 3F, right panel). Thus, units can code a rewarding outcome as either an increase or a decrease of activity. We show example units of such modulation in Figure 3-1g and h.

      We had segregated our analysis of spatio-temporal and kinematic coding upon the reward coding of units in Figure 4L-M. Yet, following this comment and in an effort of further clarifying this segregation, we introduced panels with the mean zScore of units during outcome evaluation in Figure 4L.

      We amended the main text to better explain these findings (lines 544-562).

      “Previous reports suggest that EPN units that project to the lateral habenula encode reward as a decrease in firing rate. Thus, we wished to ask whether reward encoding units can code kinematic and spatio-temporal variables as well.

      To this end, we first segregated units upon their reward coding properties: reward positive (which increased activity with reward) and reward negative units (which decreased activity with reward). We performed auROC on the 250ms after head entry comparing rewarded trials and incorrect trails (p<0.001, permutation test). Mean activity of reward insensitive, positive and negative units is shown in Fig. 4L. Next, we performed a dimensionality reduction on the coefficients of the model that best explained both contexts (kinematic + spatio-temporal model on pooled data) using UMAP (McInnes et al., 2018). We observe a continuum rather than discrete clusters (Fig. 4L). Note that individual units are color coded according to their responsivity to reward. We did not find a clear clustering either.”  

      Paragraph added in the discussion (lines 749-755):

      “In this study, we found that rewarding outcomes can be represented by EPN units through either an increase or a decrease in firing rate (Fig. 3F, 3-1g-h, 4L). While Stephenson-Jones et al., 2016 found that lateral habenula (LHb)-projecting neurons within the EPN of mice primarily encoded rewarding outcomes by a decrease in firing rate, Hong and Hikosaka, 2008 observed that in primates, LHb-projecting units could encode reward through either a decrease or an increase in firing rate. Thus, our results align more closely with the latter study, which also employed an operant conditioning task.”

      (3) The authors say that: 'reward and kinematic doing are not mutually exclusive, challenging the notion of distinct pathways and movement processing'. However, it is not clear whether the data presented in this work supports this statement. First, the authors have not attempted to record from the entire EPN. Thus it is possible that the coding might be more segregated in other parts of EPN. Second, EPNs have previously been shown to display positive firing for negative outcomes and vice versa, something which the authors do not find here. It is possible that those neurons might not encode kinematic and movement variables. Thus, the authors should point out in the main text the possibility that the EPN activity recorded might be missing some parts of the whole EPN.

      We thank the reviewer for the opportunity to expand on this topic. We believe it is certainly possible that other not-recorded regions of the EPN might exhibit greater segregation of reward and kinematics. However, we considered it worthwhile pointing out that from the dataset collected in this study reward-sensitive units encode kinematics in a similar fashion to reward-insensitive ones (Fig. 4L,M). Moreover, we asked specifically whether reward-negative units (that decrease firing rate with rewarding outcomes, as previously reported) could encode kinematics and spatio-temporal variables with different strength than reward-insensitive ones and could not find significant differences (Fig. 4M).

      We did indeed find units that displayed decreased firing rate upon rewarding outcomes, as has been previously reported. We have addressed this fact more thoroughly in point (2). 

      Finally, we agree with the reviewer that the dataset collected in this study is by no means exhaustive of the entire EPN and have thus included a sentence pointing this out in the Discussion section (lines 805-806):

      “Given that we did not record from the entire EPN, it is still possible that another region of the nucleus might exhibit more segregation.”

      (4) The authors use an IR beam system to record licks and make a strong claim about the nature of lick encoding in the EPN. However, the authors should note that IR beam system is not the most accurate way of detecting licks given that any object blocking the path (paw or jaw-dropping) will be detected as lick events. Capacitance based, closed-loop detection, or video capturing is better suited to detect individual licks. Given that the authors are interested in kinematics of licking, this is important. The authors should either point this out in the main text or verify in the system if the IR beam is correctly detecting licks using a combination of those methods.

      We thank the reviewer for the opportunity of clarifying the lick event acquisition. We have experience using electrical alternatives to lickometers; however, we believe they were not best suited to this application. Closed-loop lickometers generally use a metallic grid upon which animals stand so that the loop can be closed; however, we wanted to have a transparent floor. We have found capacitance based lickometers to be useful in head-fixed conditions but have noticed that they are very dependent on animal position and proximity of other bodyparts such as limbs. Given the freely moving aspect of the task this was difficult to control. Finally, both electric alternatives for lickometers are more prone to noise and may introduce electrical artifacts that might contaminate the spiking signal. This is why we opted to use a slit in combination with an IR beam that would only fit the tongue and that forced enough protrusion such that individual licks could be monitored. Further, the slit could not fit other body-parts like the paw or jaw. We have now included a video (Supp. Video 2) showing a closeup of this behavior that better conveys how the jaw and paw do not fit inside the slit. The following text has been added in the corresponding methods section (lines 97-98):

      “The lickometer slit was just wide enough to fit the tongue and deep enough to evoke a clear tongue protrusion.”

      Reviewer #1 (Recommendations For The Authors):

      (1)The authors should verify using opto-tagging of either Vglut2, SOM, or PV neurons whether they can see the same firing pattern. If not, the authors should address this weakness in the paper.

      We thank the reviewer for this important point, we have provided a more detailed reply above.

      (2)The way dPCA or PCA is applied to the data is not stated at all in the main text. Are all units from different mice combined? Or applied separately for each mouse? How does that affect the interpretation of the data? At least a brief text should be included in the main text to guide the readers.

      We thank the reviewer for pointing out this important omission. We have included an explanation in the Methods section and in the Main text.

      Methods (lines 182-184):

      “For all population level analyses individual units recorded from all sessions and all animals were pooled to construct pseudo-simultaneous population response of combined data mostly recorded separately.”

      Main text (lines 397-399):

      “For population level analyses throughout the study, we pooled recorded units from all animals to construct a pseudo-simultaneous population.”

      Discussion (lines 729-730):

      “…(from pooled units from all animals to construct a pseudo-simultaneous population, which assumes homogeneity across subjects)”

      (3) The authors argue that they do not find 'value coding' in this study. However, the authors never manipulate reward size or probability, but only the uncertainty or difficulty of the task. This might be better termed 'difficulty', and it is difficult to say whether this correlates with value in this task. For instance, mice might be very confident about the choice, even for an intermediate frequency sweep, if the mouse had waited long enough to hear the full sweep. In that case, the difficulty would not correlate with value, given that the mouse will think the value of the port it is going to is high. Thus, authors should avoid using the term value.

      We agree with the reviewer. We have modified the text to specify that difficulty was the variable being studied and added the following sentence in the Discussion (lines 747-748):

      “It is still possible that by modifying reward contingencies such as droplet size value coding could be evidenced.”

      (4) How have the authors obtained Figure 7D bottom panel? It is unclear at all what this correlation represents. Are the authors looking at a correlation between instantaneous firing rate and lick rate during a lick bout?

      We thank the reviewer for pointing out that omission. It is indeed correlation coefficient between the instantaneous firing rate and the instantaneous lick rate for a lick bout. We have included labeling in Figure 7D and pointed this out in the main text [lines 680-681]:

      “Fig.7D, lower panel shows the correlation coefficient between the instantaneous firing rate and the instantaneous lick rate within a lick bout for all units.”

      Reviewer #2 (Public Review):

      This paper examined how the activity of neurons in the entopeduncular nucleus (EPN) of mice relates to kinematics, value, and reward. The authors recorded neural activity during an auditory-cued two-alternative choice task, allowing them to examine how neuronal firing relates to specific movements like licking or paw movements, as well as how contextual factors like task stage or proximity to a goal influence the coding of kinematic and spatiotemporal features. The data shows that the firing of individual neurons is linked to kinematic features such as lick or step cycles. However, the majority of neurons exhibited activity related to both movement types, suggesting that EPN neuronal activity does not merely reflect muscle-level representations. This contradicts what would be expected from traditional action selection or action specification models of the basal ganglia.

      The authors also show that spatiotemporal variables account for more variability compared to kinematic features alone. Using demixed Principal Component Analysis, they reveal that at the population level, the three principal components explaining the most variance were related to specific temporal or spatial features of the task, such as ramping activity as mice approached reward ports, rather than trial outcome or specific actions. Notably, this activity was present in neurons whose firing was also modulated by kinematic features, demonstrating that individual EPN neurons integrate multiple features. A weakness is that what the spatiotemporal activity reflects is not well specified. The authors suggest some may relate to action value due to greater modulation when approaching a reward port, but acknowledge action value is not well parametrized or separated from variables like reward expectation.

      We thank the reviewer for the comment. We indeed believe that further exploring these spatiotemporal signals is important and will be the subject of future studies.

      A key goal was to determine whether activity related to expected value and reward delivery arose from a distinct population of EPN neurons or was also present in neurons modulated by kinematic and spatiotemporal features. In contrast to previous studies (Hong & Hikosaka 2008 and Stephenson-Jones et al., 2016), the current data reveals that individual neurons can exhibit modulation by both reward and kinematic parameters. Two potential differences may explain this discrepancy: First, the previous studies used head-fixed recordings, where it may have been easier to isolate movement versus reward-related responses. Second, those studies observed prominent phasic responses to the delivery or omission of expected rewards - responses largely absent in the current paper. This absence suggests a possibility that neurons exhibiting such phasic "reward" responses were not sampled, which is plausible since in both primates and rodents, these neurons tend to be located in restricted topographic regions. Alternatively, in the head-fixed recordings, kinematic/spatial coding may have gone undetected due to the forced immobility.

      Thank you for raising this point. Nevertheless, there is some phasic activity associated with reward responses, which can be seen in the new panel in Figure 4L.

      Overall, this paper offers needed insight into how the basal ganglia output encodes behavior. The EPN recordings from freely moving mice clearly demonstrate that individual neurons integrate reward, kinematic, and spatiotemporal features, challenging traditional models. However, the specific relationship between spatiotemporal activity and factors like action value remains unclear.

      We really appreciate this reviewer for their valuable comments.

      Reviewer #2 (Recommendations For The Authors):

      One small suggestion is to make sure that all the panels in the figures are well annotated. I struggled in places to know what certain alignments or groupings meant because they were not labelled. An example would be what do the lines correspond to in the lower panels of Figure 2D and E. I could figure it out from other panels but it would have helped if each panel had better labelling.

      Thanks for pointing this out, we have improved labelling across the figures and corrected the specific example you have pointed out.

      The paper is very nice though. Congratulations!

      Thank you very much.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editor for the comment. A statistics table has been added.

      References:

      Lazaridis, I., Tzortzi, O., Weglage, M., Märtin, A., Xuan, Y., Parent, M., Johansson, Y., Fuzik, J., Fürth, D., Fenno, L. E., Ramakrishnan, C., Silberberg, G., Deisseroth, K., Carlén, M., & Meletis, K. (2019). A hypothalamus-habenula circuit controls aversion. Molecular Psychiatry, 24(9), 1351–1368. https://doi.org/10.1038/s41380-019-0369-5

      Martinez-Garcia, R. I., Voelcker, B., Zaltsman, J. B., Patrick, S. L., Stevens, T. R., Connors, B. W., & Cruikshank, S. J. (2020). Two dynamically distinct circuits drive inhibition in the sensory thalamus. Nature, 583(7818), 813–818. https://doi.org/10.1038/s41586-0202512-5

      McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

      Zingg, B., Chou, X. lin, Zhang, Z. gang, Mesik, L., Liang, F., Tao, H. W., & Zhang, L. I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron, 93(1), 33–47. https://doi.org/10.1016/j.neuron.2016.11.045

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for your time and consideration on our submission. We also thank the reviewers for their consideration and helpful comments.  We have revised the introduction, results, and discussion sections of the revised manuscript in accordance with the reviewers’ suggestions, which have enhanced the clarity of our work. Specifically, we have clarified that the aim of the study is to report newly discovered sperm behaviours inside the uterus via high resolution deep tissue live imaging, and to stimulate further studies and discussion in the field of postcopulatory sexual selection in mice based on our observations. To the best of our knowledge, many of the specific sperm behaviours described in our manuscript are being reported for the first time, proven through direct observation inside the living reproductive tract.

      We have also restructured our manuscript and moved our hypothetical interpretations based on our experimental observations to the discussion section. We hope that these revisions have clarified our claims and that our revised manuscript effectively communicates the importance of our findings and its values in prompting new questions and insight that encourage further studies. We believe that our work clearly demonstrates the importance of sperm/reproductive tract interaction, which cannot be adequately studied in artificial environments, and may become an important guideline for designing future experiments and studies.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. The authors are trying to distinguish between two hypotheses put forward by others on the role of the sperm hook: (1) the sperm cooperation hypothesis (the sperm hook helps to form sperm trains) vs (2) the migration hypothesis (that the sperm hook is needed for sperm movement through the uterus). They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall. 

      We thank the reviewer for summarizing our work and the critical review of our paper. As summarized, the sperm hook has been primarily associated with the sperm cooperation (sperm hook) hypothesis and the migration hypothesis. However, we would like to emphasize that the aim of our work is not to cross check between the two hypotheses. Our aim was not to disprove either hypothesis, but rather to develop an experimental platform that enables detailed observation of sperm migration dynamics within the live reproductive tract. 

      Through live imaging, we observed both the formation of sperm trains as well as interaction between the sperm and female reproductive tract epithelium. However, in our observations, we could not find advantage in terms of faster movement for the rarely observed sperm trains. While these events were infrequent in our experiments, we are not asserting that the sperm train hypothesis is invalid but rather reporting our observations as is. 

      The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. We have extensively revised the manuscript structure to clarify our findings.

      Strengths: 

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm. 

      Weaknesses: 

      The paper is descriptive and the data are correlations. 

      The data are not properly described in the figure legends. 

      When statistical analyses are performed, the authors do not comment on the trend that sperm from the three males behave differently from each other. This weakens confidence in the results. For example, in Figure 1 the sperm from male 3613 (blue squares) look different from male 838 (red circles), but all of these data are considered together. The authors should comment on why sperm across males are considered together when the individual data points appear to be different across males. 

      Thank you for your comments and suggestions. We have revisited all figure legends and made the necessary amendments (shown in the red-lined manuscript). Please note that, for a better flow of the paper, the previous Figure 1 has been changed to Figure 2 in the revised manuscript.

      Regarding the analysis using different males, we would like to explain the statistics used. We used generalized linear mixed models to test the effect of the Angle and Distance to the wall on the migration kinetic parameters. The advantage of the generalized linear mixed models is that they consider individual variations in the data as an error term, thereby controlling such individual variations. 

      There are two main factors contributing to individual variations. One is, as you pointed out, the difference in sperm from different males. However, we used genetically similar mice, so genetical variations must be minimal. Nonetheless, there must be individual differences that caused variations including age, stress level as well as body conditions. As these factors cannot be controlled, we used the mixed model approach where individual variations are grouped within the individual. This approach enabled us to test the effect of each explanatory variable (Angle and Distance) within an individual. 

      The second factor that could cause variations is the female oestrous status. To avoid artifacts that could influence sperm behaviour, we did not use any invasive methods, such as hormone injections, to control or induce female oestrus. We controlled for this possible effect by including the mating date as a random effect. Since each female was used only once, the mating date reflects the variation caused by each female.

      To provide further verification that the variation between individual males do not affect our results, we conducted analysis per individual male and mating dates (per each female). As clearly shown, sperm data points from individual males or female also show consistent clear correlations with the distance from the uterus wall. As pointed out, while the mean sperm speed could be different between individuals, they are not the topic we are interested in here. Our interest here is the effect of the distance between sperm and the uterine wall. Additionally, the variation between males is not always larger than those effect of the day (female), which in total suggest that integrating male variation is not essential. We have added this information to Supplementary Figure (Fig. S3) of the revised supplementary materials.

      Moving forward, we can also consider the same analysis for the effects of the distance from wall on sperm SWR and LIN (linearity of forward progression) where no statistical significance was found. As see in the following figures, no statistically significant effect of the distance to wall on SWR and LIN are seen in that the regression lines drawn for each male and mating dates.

      In summary, the statistical approach we used here has successfully reflected variations in sperm kinetics from different males as well as the variance from different females. We hope that our explanations and additional analysis answer your concerns. 

      Movies S8-S10 are single data points and no statistical analyses are performed. Therefore, it is unclear how penetrant the sperm movements are. 

      With respect to Movie S8, Figure 4A and B (Figure 5A and B in the current revised manuscript) depict the trajectories of accumulated spermatozoa (sperm trains) in the female uterus, as shown in Movie S8. We have added this information to the revised figure legend (L 293) for clarity. We could not observe sperm trains that moved faster than single sperms during over 100 hours of observation and collection of over 10TB of images. The three sperm trains presented in Fig. 5B were the sperm trains that moved in the head-forward direction. Most other identifiable trains, or clusters, did not move or could not move forward as their heads were entangled randomly. Although we of course agree that a statistical test for Movie S8 (also Fig. 5B) would be great, due to the small number of sperm trains we found, we could not perform meaningful statistical tests. Instead, we provided all data in the box plots in Fig. 5C so that readers can evaluate and understand our points. We believe that this is a more neutral way of presenting our data rather than providing statistical significance.

      Regarding Movies S9 and S10, we are not entirely sure whether we understood your comments clearly. It would be very helpful if you could point out more specifically to the manuscript with line numbers as we would like to address your concerns and suggestions, and we believe that your input will improve our manuscript. We did not describe the penetration of sperm in these movies. Movies S9 and S10 are newly found sperm behaviours inside the UTJ and Isthmus. We observed that sperm beating is influenced by the width of luminal space as well as internal flow as see in Movies S9 and S10. As our animal model only expresses red fluorescence in the midpiece, accurate beating frequency measurement cannot be performed. However, we can clearly observe that beating is not continuous and almost results in a halt with respect to reproductive tract variations. We revised our description about the findings about beating speed changes in the revised manuscript (LL 305-335).  

      Movies S1B - did the authors also track the movement of sperm located in the middle of the uterus (not close to the wall)? Without this measurement, they can't be certain that sperm close to the uterus wall travels faster. 

      We revised the new Movie S1B to include videos that were used for the sperm migration kinetics analysis in Figure 2 (previously Figure 1). As you can see in the movies, the graph, and statistical analysis, there is a clear trend showing spermatozoa migration is slower as a function of distance from the uterus wall. Regarding your comment with respect to the middle of the uterus (not close to the wall), we have added another movie (Movie S1C) that was acquired at different depths from the wall (going towards the centre of the uterus). As clearly seen in Movie S1c, when imaging deeper into the uterus, there are an increasing number of inactive or slow-moving spermatozoa. Since the diameter of the uterus is easily over 2mm, we currently do not have optical access to exactly the centre of the uterus, but for all depths that are observable, spermatozoa near the wall were clearly faster.

      Movie S5A - is of lower magnitude (200 um scale bar) while the others have 50 and 20 uM scale bars. Individual sperm movement can be observed in the 20 uM (Movie 5SC). If the authors went to prove that there is no upsucking movement of sperm by the uterine contractions, they need to provide a high magnification image. 

      The main focus of video S5A, is the intramural UTJ where spermatozoa are located in rows within narrow luminal space (see Author response image 1). When there is up-suck like sperm passive carriage, there must be sperm movement from the uterus to intramural UTJ as in Author response image 1 left. However, there is no such sperm movement could be seen in our observations, as shown in Movie 5A. Importantly, as you can see in Movie 5A, indicated by an arrow from 5 sec to 6 sec, some spermatozoa are moving downward (see also Author response image 1 right). This is the opposite direction of movement with respect to possible up-suck like sperm carriage. 

      Genetical evidence also support up-suck like passive sperm carriage is not the case for sperm migration from the uterus to UTJ. If environmental up-suck like passive transfer plays an important role, it is unlikely that genetically modified spermatozoa cannot pass the entrance of the intramural UTJ (Nakanishi et al., 2004, Biol. Reprod.; Li et al., 2013, J. Mol. Cell Biol.; Larasati et al., 2020, Biol. Reprod.; Qu et al., 2021, Protein Cell). 

      Author response image 1.

      The left image represents what is expected when up-suck like passive sperm carriage occurs. The right image represents what is actually experimentally observed in the intramural UTJ (see Movie S5A). The direction of the arrowheads indicates the direction of sperm movement.

      Movie S8 - if the authors want to make the case that clustered sperm do not move faster than unclustered sperm, then they need to show Movie S8 at higher magnification. They also need to quantify these data. 

      We understand your concern. As shown in Figure 5B, we included all sperm kinetics data of each sperm train and unlinked spermatozoon around the trains as individual dots. The only analysis we did not conduct was a statistical test with the data as it could be erroneous due to the large sample size difference (3 trains vs 181 unlinked spermatozoa). As the medians of the four sperm kinetic parameters are similar except SWR, we concluded that they are not necessarily faster than unlinked single spermatozoa. Since there is no known advantage to spermatozoa (including sperm trains) with intermediate moving speeds for sperm competition – for example in IVF, success fertilization rate is high when faster and active spermatozoa with normal shape are selected (Vaughan & Sakkas, 2019, Biol. Reprod.) – it is questionable whether there can be an advantage to the formation of sperm trains whose speed is not faster than unlinked spermatozoa in our data.

      However, we do not agree with your comment regarding the need for higher magnification. Measurement of the sperm migration speeds (kinetic parameters) does not require measurement of exact tail movements in this study. Only sperm heads were tracked to measure their trajectory and such tracking was better done at low mag. For example, measuring the speed of a car does not need higher magnifications to visualize the rotation of the wheels. Additionally, including the effect of observation magnification on the sperm kinetic parameters for all 4 GLMM models for Figure 2 (Table S3) does not change the result, which shows that magnification is not a factor that influences our analysis. 

      Movie S9C - what is the evidence that these sperm are dead or damaged? 

      Thank you for your valid comment. We tracked sperm movements for at least 10 minutes and such entangled spermatozoa in the UTJ never became re-active. As you can see in the new Movie S9b, entangled spermatozoa were also acrosome re-acted (green acrosome head is gone) while active spermatozoa are responding to peristaltic movement by exhibiting movements within the same video. However, as you pointed out, we did not measure their viability with appropriate dyes. Although we also considered about extracting these spermatozoa and performing viability tests, we could not come up with a way to specifically extract the exact spermatozoa that were imaged. Considering your comments, we changed the term damaged or dead to inactive in the revised manuscript (LL 313-316, Legend Figure 6D. LL 380-384).

      Movie S10 - both slow- and fast-moving sperm are seen throughout the course of the movie, which does not support the authors' conclusion that sperm tails beat faster over time. 

      There must have been a misunderstanding. We did not indicate that sperm beating got faster over time anywhere in the main manuscript, including the figure legend and related movie captions. As correctly pointed out, the sperm beating speed changes over time (not getting faster over time) and shows a correlation with internal fluid flow and width of luminal space (LL 320-332). Please let us know if you meant something else. 

      Reviewer #2 (Public Review): 

      Summary: 

      The specific objective of this study was to determine the role of the large apical hook on the head of mouse sperm (Mus musculus) in sperm migration through the female reproductive tract. The authors used a custom-built two-photon microscope system to obtain digital videos of sperm moving within the female reproductive tract. They used sperm from genetically modified male mice that produce fluorescence in the sperm head and flagellar midpiece to enable visualization of sperm moving within the tract. Based on various observations, the authors concluded that the hook serves to facilitate sperm migration by hooking sperm onto the lining of the female reproductive tract, rather than by hooking sperm together to form a sperm train that would move them more quickly through the tract. The images and videos are excellent and inspirational to researchers in the field of mammalian sperm migration, but interpretations of the behaviors are highly speculative and not supported by controlled experimentation. 

      Thank you for your critical review and valuable comments on our manuscript. As pointed out, some of our findings and suggestions were largely observation based. However, to the best of our knowledge, many of our observations are novel, particularly in the context of live imaging inside the female uterus and reproductive tract. We believe these observations open doors to many questions and follow up studies that can be envisioned based on our findings, which is what drives science forward. 

      That being said, we entirely agree that many follow up experiments need to be designed and performed, especially to validate the exact molecular mechanisms of the observed dynamics. We acknowledge that it is unfortunate we currently lack the proper molecular experimental toolsets to perform further tests. We have removed much of the hypothetical discussions from the results section and moved them to the discussion section. We hope that our revision more clearly defines the observed experimental data and our interpretations.

      Strengths: 

      The microscope system developed by the authors could be of interest to others investigating sperm migration. 

      The new behaviors shown in the images and videos could be of interest to others in the field, in terms of stimulating the development of new hypotheses to investigate. 

      Weaknesses: 

      The authors stated several hypotheses about the functions of the sperm behaviors they saw, but the hypotheses were not clearly stated or tested experimentally. 

      The hypothesis statements were weakened by the use of hedge words, such as "may". 

      We appreciate your helpful comments and have revised our hypotheses and suggestions accordingly. We have removed instances of “may” or revised it to be more direct. We have also moved most of our interpretations and hypotheses from the results to the discussion section. 

      It is important to note that experimental approaches to test what we suggested from our findings in the current ex-vivo observation platform are not trivial and require extensive investigation of several unknown factors of the female reproductive tract. For instance, obtaining detailed information on the chemical characteristics and fluid dynamics in the female reproductive tract is essential to build a microfluidic channel that accurately resembles the uterus and oviduct, replicating what we found in an extracted living entire organ. This poses a significant challenge and requires collaborative expertise from many labs, which we hope to build in the near future. 

      Furthermore, our biggest concern is that, even if we were to construct the appropriate microfluidic channel to test sperm migration, it is very likely that the sperm behaviours that we observed under natural conditions may not be replicated in artificial environments. This raises questions about whether in-silico or in-vitro findings can truly resemble what we reported here using the ex-vivo observation inside a living organ.

      To share our experience related to this difficulty, at the initial stage of our study, we attempted sperm injection combined with fluorescent beads to visualize the fluid flow, as well as dyeing the female reproductive tract and spermatozoa after mating. However, none of these resulted in meaningful results. Another potential approach to perform similar research regarding our claims is using genetical engineering to indirectly confirm the influence of the sperm hook morphology on sperm behaviour. However, such an approach lacks a mechanical demonstration about how the sperm hook interacts with the female reproductive tract. 

      It is unfortunate that the sperm behaviours that we found and reported here are considered as highly speculative. The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, these behaviours include tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. 

      We have extensively revised the manuscript structure to clarify our findings and integrated our points in the introduction. Although we understand our following hypotheses may be considered speculative and the causative relationship between the sperm hook and its role in sperm migration requires further experimental approaches, we believe that the image-based observation of dynamic behaviours of spermatozoa are solid. We believe our findings will facilitate further studies and discussion in the field of studies on postcopulatory sexual selection in rodents.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript is written for an expert in a fairly small field. I recommend that the authors rewrite the manuscript to make it more accessible to people outside of the field. These suggestions include 

      (1) Provide a diagram of the female reproductive tract in Figure 1. 

      a. Indicate where sperm enter the tract and the location of the oocyte they are trying to reach. 

      b. Label all areas of the uterus that are mentioned in this study and be consistent about the label. 

      (2) All movies should have a diagram of the location of the uterus that is being imaged. 

      Thank you for the great suggestion. We have added a diagram of the female reproductive tract in the revised Figure 1A. In response to your comments 1a and b, we have indicated such information by including eggs in the ampulla and arrows that indicate sperm migration direction. We have also labelled the name of the specific areas that were studied in the manuscript.

      We are unsure how to integrate the diagram in all movies without reframing the videos, which could cause serious corruption of the files. More importantly, we think that adding the same diagram to all movies may complicate the visuals and disrupt indications and subject in the movie. Instead, we have referred to the common diagram (Figure 1A) in each movie caption, specifying where the video was taken. Thank you for the suggestion. With this information, we hope readers can now more easily understand where we made the observations. 

      (3) The major questions in the field need to be better described in the introduction. 

      Thank you for your valuable suggestions and specific comments which have greatly helped improve our manuscript. We have revised our introduction and discussion sections by adding more literature reviews and integrating studies across a wider range of the postcopulatory sexual selection, as per your suggestion (LL 34-57, LL 385-398).

      (4) The major question that the authors are trying to address should be described in the introduction. 

      Thank you for the helpful suggestion. We have clarified in the introduction that our aim was to contribute to the field of postcopulatory sexual selection in rodents by advancing methodological progress and to stimulate discussion and future research on the function of the sperm hook in murine rodents (LL 76-94) based on our observations.

      (5) A discussion of the sperm hook should be provided. How many species have this structure (or similar structure)? 

      We have integrated your point into the revised discussion section. Essentially, most murine rodent species have sperm hooks (while their exact shapes differ). However, as there are over 500 species and not all of them have been tested, we do not know exactly how many of them have this structure. Therefore, we included paper references that examined species variations in sperm hook characteristics and their possible correlation with sperm competition (LL 385417) in the discussion. Additionally, we also included papers by Breed (2004) and by Roldan et al (1992) that investigated murine rodents with a sperm hook in the introduction section as well (LL 58-61).  

      (6) The figure legends must describe everything in the figure or movie. 

      Thank you for the helpful suggestion. We previously thought that our figure legends may be too long. We have included further information in the figure legends and movie captions. We have also revised the movies by adding some clips following our revision (Movie S1).

      Reviewer #2 (Recommendations For The Authors): 

      Here are some specific concerns I had about the clarity of approach to experiments and interpretations of results. 

      In the Introduction, the authors stated that the study was intended to determine the function of the hooks on the mouse sperm heads. However, in the Results section, the authors did not explain the rationale for the first set of experiments with respect to the overall objective of the study. In this experiment, the authors measured the velocities of sperm swimming in the uterus and found that the sperm moved faster when closer to the uterine wall (VCL, VSL). They concluded that migration along the uterine wall "may" be an efficient strategy for reaching the entrance to the uterotubal junction (UTJ) and did not explain how this related to the function of the hooks. 

      Thank you for your critical comment and guidance. We have changed the order of Figure 1 and Figure 2 and revised the result section to integrate your points. At the initial stage of the study, we expected to find evidence of the function of sperm trains in aiding sperm migration in the female uterus (which has not been observed in the live uterus; previous works were done invitro with extracted sperm from epididymis or uterus after mating). However, what we found was something unexpected: dynamic sperm hook related movements facilitating sperm migration inside the female uterus by playing a mechanical role in sperm interaction with the uterine wall. These results that were presented in the previous Figure 2 has been reorganized as the new Figure 1.

      Based on this observation, our research later moved to clarify whether such sperm-epithelium interaction indeed helps sperm migration. This led us to measure sperm kinetics in relation to their distance and angle to the uterine wall. We have revised our introduction and result parts by integrating these points. We hope that our revision will answer your questions. We have also reduced the use of ‘may’ or ‘can’ in the results section. In the revised manuscript, we have moved such hypotheses to the discussion section and focused on what we observed in the results section.

      The authors proposed that the sperm hook "may" play a crucial role in determining the direction of migration. When sperm encountered a uterine wall, significantly more changed migration direction toward the pro-hook direction than toward the anti-hook direction. In Figure 2B, sperm behavior is not visually understandable nor clearly explained. 

      Thank you for the helpful comments. We have removed “may” and “might” to make our claim clearer and more concise. We have also revised the previous Figure 2B by combining it with the previous Figure 2C (they have been combined into Figure 1C now). We have also revised Figure 1B by increasing the line thickness of the sperm trajectory of the pro-wall-hook direction and added the anti-wall-hook trajectory. We hope that these revisions make the figure easier to understand.

      In Figure 2E, are the authors showing that the tip of the hook is caught between two epithelial cells? Please clarify the meaning of this figure. 

      Please clarify the difference between "tapping" and "anchoring". 

      Thank you for the detailed comments. As you pointed out, we currently have no evidence whether sperm can be caught in epithelia inter-cellular gaps. We have revised this source of confusion by removing the gap in the revised figure (Figure 1E). We have also included the definition of anchoring (LL 142-143) and tapping (LL 128-130). Anchoring facilitates the attachment of sperm to the uterine epithelia. Such anchoring also involves the catching of the sperm head in the inter-mucosal fold or gap, particularly at the entrance of the intramural UTJ at the end of the uterus. Tapping is the interaction between the head hook and epithelia in which the sperm hook is tapping (or patting) on the surface. Sperm tapping can be a byproduct that results from flagella beating when spermatozoa migrate toward the pro-wall-hook direction along the uterine wall (epithelia) or can play some role in sperm migration. As we currently cannot draw a conclusion, we did not integrate the possible function of the tapping in the manuscript.

      The authors proposed that opposite sliding of neighboring mucosal folds lining the UTJ would cause small openings to form, through which only perhaps one sperm at a time could enter and pass through the UTJ into the uterus. This hypothesis was not actually tested. 

      Imaging inside deep tissue is challenging due to light scattering as it penetrates through biological tissue. While this is also true for the uterus, the intramural UTJ is especially difficult to image because the UTJ consists of several thick muscle and cell layers (see Movie S5A). Another challenge is that the peristaltic movement of the UTJ results in constant movement, making continuous tracking of single sperms while passing through the entirety of the UTJ impossible in our current experiments. We have moved this hypothesis to the discussion section and restated that this is a pure hypothetical model (LL 399-406). We hope that our model encourages the community in designing or establishing an improved ex-vivo observation system that may be able to test this hypothetical model in the near future.

      Next, the authors hypothesized that sperm that encounter the small openings in the UTJ may then be guided onward and the hooks could prevent backward slipping. This was also not tested. 

      As you’ve noted, the function of the sperm hook that aids in sliding and preventing backward slipping could not be tested directly in our ex-vivo observation platform that relies on natural movement of the living organ. However, we believe that these limitations also highlight the importance of continued research and the development of more advanced methodologies in this field.

      We would also like to note that we provide direct observations of spermatozoa resisting internal flow due to reproductive tract contractions in Movie S3A, B as well as Movie S5B. We referred to these movies and pointed out the role of anchoring (sperm attachment) in preventing sperm from being squeezing out (LL 140-149, LL 224-241). Unfortunately, we cannot conceive of how this behaviour can be tested additionally in any uterus-resembling microfluidic device or ex-vivo systems. In line with your suggestion, we have rewritten the related result section and moved our related discussions in the result part to the discussion section (LL 224-241, LL 399-417). 

      The authors observed that large numbers of uterine sperm are attached to the entrance of the UTJ. Some sperm clustered and synchronized their flagellar beating. The authors speculated that this behavior served to push sperm in clusters onward through the UTJ. 

      We would like to note that we did not speculate that sperm clustering and their synchronization could serve to push spermatozoa in a cluster to move onward through the UTJ. We only pointed out our observation in recorded videos, that generative flow from the clustered spermatozoa pushed away other spermatozoa as seen in Movie S7 (LL 261-264). Although such sperm cooperation is possible (blocking passage of later sperm), we cannot draw that conclusion from our observation. The possibility you pointed out (pushing sperm onward through the UTJ) was suggested by Qu et al in 2021 [Cooperation-based sperm clusters mediate sperm oviduct entry and fertilization, Protein & Cell] based on their observations on cleared dead reproductive tracts.

      The authors found only a few sperm trains in the uterus, UTJ, and oviduct, so they could not measure sufficient numbers of samples to test whether sperm trains swim faster than single sperm. Without sufficient data, they concluded that the "sperm trains did not move faster than unlinked single spermatozoa." 

      We would like to take this opportunity to clarify our claims. We do not claim that our current experiments can give the final verdict on whether the sperm train hypothesis for faster swimming is correct or not. The phrase “sperm trains did not move faster” was not intended to mean that the sperm train hypothesis is invalid.  We did not draw a conclusion but dryly described the experimental data that we observed (LL 279-286).  We would once again like to emphasize that the main claim of our manuscript is not to rule out the sperm train hypothesis, but to present the various dynamic interactions of the sperm head with the female reproductive tract. To make the statement more balanced, we revised the sentence as “observed sperm trains did not move faster or slower than unlinked single spermatozoa” (LL 281-282).

      The authors hypothesized that the dense sperm clusters at the entrance into the UTJ could prevent the rival's sperm from entering the UTJ (due to plugging entrance and/or creating an outward flow to sweep back the rival's sperm), but they did not test it. 

      We agree that we were not able to test such possible function of the sperm cluster at UTJ entrance. Following your concerns, we revised the result part (LL 256-264) by removing most of our discussions related to the observed phenomena. We also integrated some interpretation rather to the discussion section (LL 421-437) and suggested that future works using appropriate microfluidic channel designs or sequential double mating experiments may be performed for additional tests (LL 443-447). However, we would like to point out that Movie S7C clearly shows surrounding sperms that are swept away from the sperm clusters. Since the sperm density is high, this is almost equivalent to a particle image velocimetry experiment, and we can clearly see the effect of the outward flow generated by the sperm clusters.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Positive comments:

      We appreciate the positive comments of the editor and reviewers. The editor noted that the paper presents a “technological advance” that has enabled “important insights about the brain circuits through which the cerebellum could participate in social interactions.” Reviewer 1 thought this was a “timely and important study with solid evidence for correlative conclusions” and that the experiments were “technically challenging” and “well-performed”. Reviewer 2 stated that the finding of correlated activity between the regions is “interesting as non-motor functions of the cerebellum are relatively little explored.” They also thought “that the data are presented clearly, and the manuscript is well-written”. Reviewer 3 mentioned that “this approach can be useful for many neuroscientists”. We thank all the positive comments from the editors and all the reviewers.

      Reviewer #1 (Public Review)

      While the novelty of the device is strongly emphasized, I find that its value is somewhat diminished by the wire-free device developed by the same group as it should thus be possible to perform calcium imaging wire-free and electrophysiological recording via a single conventional cable (or also via wireless headstages).

      While it would be potentially possible to use a wire-free Miniscope in parallel with a wired electrophysiology recording system, this would result in a larger footprint on the animal’s head, more than a gram in increased weight due to an added LiPo battery, a larger electrophysiology head-stage, and limited recording length due to a battery capacity of around 20 minutes. Our main goal for the development of the E-scope platform was to develop an expandable electrophysiology recording board that would work with all previously built UCLA Miniscopes while also streamlining the integration of power and data into the coaxial cable connection already familiar to hundreds of labs using Miniscopes. The vast majority of Miniscope experiments are done using wired systems and we aimed to support the expansion of those systems instead of requiring a more substantial switch to using wire-free Miniscopes.

      The role of the identified network activations in social interactions is not touched upon.

      We agree with the reviewer that we have not discovered a causal role for the co-modulated activity patterns we have observed. As these causal experiments will require the development of real-time techniques for blocking socially evoked changes in firing rate in cerebellum and ACC, we are currently planning experiments to address causality. These results will be described in a future publication.

      Reviewer #1 (Recommendations for the Authors):

      Please provide the number of recorded mice.

      The number is now provided in the revised manuscript.

      If the recorded areas (cerebellar cortex, DN, and ACC) are part of the same circuit regulating social interactions, it would be nice to get insights into the directionality of the circuit. The authors favor the possibility that during social behavior, cerebellar efferences indirectly influence ACC activities (as in Figure 4A), however, no evidence is presented to support this interpretation. ACC activities might also indirectly influence PC firing. It may be possible to get insights into this by comparing the timing of neuronal activity in the different areas with respect to social onset.

      For this study, we mainly focused on the output of the cerebellar circuit to the cortex as previous work shows that dentate nucleus projects to the thalamus, which in turn projects to ACC and other cortical regions. (Badura et al.,eLife, 2018; Kelly et al., Nat. Neurosci., 2020) The temporal resolution of calcium imaging is limited (with the rise time of calcium events with genetically-encoded indicators taking hundreds of milliseconds) such that the resolution is insufficient to precisely assess the relative onset timing of the two regions. Our work certainly does not rule out cortical influences on PC firing.

      Reviewer #2 (Public Review)

      However, the causal relationship is far from established with the methods used, leaving it unclear if these two brain regions are similarly engaged by the behavior or if they form a pathway/loop.

      As indicated in our response to Reviewer #1’s similar critique, the goal of the presented study is to demonstrate the feasibility and capabilities of this novel device. This new tool will allow us to conduct a comprehensive and rigorous study to assess the causal role of the interactions between the cerebellum and ACC in social behavior (as well as other behaviors). These experiments are being designed now.

      Reviewer #2 (Recommendations for the Authors):

      It is unclear what is entirely unique about the E-scope. It seems that its advance is simply a common cable that allows interfacing with both devices (lighter weight than two cables is stated in the Discussion). Is this really an advance? What are its limitations? E.g., how close can the recording sites be to one another? How can it be configured for any other extracellular recording approach (tetrodes, 64-channel arrays, or Neuropixels)?

      In our experience, multiple lines of wires tethered to different head-mounted devices on an animal significantly impacts their behavior. Therefore, one of the major advantages of the UCLA Miniscope Platform is the use of a single, flexible coaxial cable to minimize the impact on tethering on behavior. The E-Scope platform builds on top of this work by incorporating electrophysiology recording capabilities into this single, flexible coaxial cable. Additionally, the electrophysiology recording hardware is backwards compatible with all previously built UCLA Miniscopes and can run through open-source and commercial commutators already used in Miniscope experiments.

      The available bandwidth within the shared single coaxial cable can handle megapixel Miniscope imaging along with the maximum data output of a 32 channel Intan Ephys IC. The E-Scope platform presented here does run the Intan Ephys IC at 20KSps for all 32 channels instead of the maximum 30KSps due to microcontroller speed limitations, but this could be overcome by using a fast microcontroller or clock, or slightly reducing the total number of electrodes samples. Finally, the E-Scope was designed to support any electrode types supported by the Intan Ephys IC. This includes up to 32 channels of passive probes such as single electrodes, tetrodes, silicon probes, and flexible multi-channel arrays but does not include Neuropixels as Neuropixels use custom active electronics on the probe to multiplex, sample, and serialize electrophysiology data.

      The authors only analyzed simple spikes in PCs for social-related activity. What about complex spikes? Is this correlated with ACC activity?

      Complex spikes were detectable to the extent that we were able to define that the recorded cell was a PC, but because these cells were recorded in freely behaving mice, accurate complex spike detection was not reliable enough to be used for further correlational analyses.

      The data is sampled in the two regions (cerebellum and ACC) at very different rates (imaging is much slower than electrophysiology; ephys data was binned). How does this affect the correlation plots?

      We generated firing rate maps for the cerebellar neural activity using a binning size that matched the sampling frequency of calcium imaging (see Methods). As mentioned in our methods, to study the relationship between the electrophysiology and calcium imaging data we binned the spike trains using 33 ms bins to match the calcium imaging sampling rate (~30 Hz). This limits the temporal resolution to calculate fine-scale correlations, but the correlations that we report are on a behaviorally relevant temporal scale. The fine temporal resolution of the electrophysiology data however can still be used to further examine at a higher temporal resolution the relationship between cerebellar output and specific social behavior epochs.

      For the correlation analysis, over what time frame was the activity relationship examined? How was this duration determined?

      Author response image 1.

      The main criteria for the time frame used to study the correlation analysis was the behavioral timescale of social interaction [see figure above for the number of social (red) and object (blue) interaction bouts (a), their duration (b) and coefficient of variation (CV) (c)]. Overall, the activity relationship time frame was based on the average duration of the social interactions (~3 sec). Periods of 3.8 before and 5.8 sec after interaction onset were used to study. Accordingly, the cross-correlograms were constructed using a maximum lag length of 5 sec. In the article we reported correlation at lag 0.

      The relationship between the cerebellum and ACC seems unconvincing. If two brain regions are similarly engaged by the behavior, wouldn't they have a high correlation? Is the activity in one region driving the other?

      We reference studies showing an anatomical and functional indirect connection between the cerebellum and the ACC or prefrontal cortex (Badura et al., eLife, 2018). Also, as stated in the introduction, the ACC is a recognized brain area for social behavioral studies. In the results, we stated that correlations increase in groups of neurons that are similarly engaged during a specific epoch in the social interaction was an expected finding. What was not expected was that there would be no difference in the distribution's correlation when the social epochs were removed, suggesting that intrinsic connectivity does not drive a difference in correlations.

      Although, since there is a cerebello-cortical loop, further study will be needed to understand which area initiates this type of activity during social behavior,

      • In the figures, the color-coded scale bars should be labeled as z-scores (confusing without them).

      • In Figure 4, the color differences for Soc-ACC, Soc+ACC and SocNS ACC should be more striking as it is hard to tell them apart because they are all similar shades of blue-gray.

      We thank the reviewer for their suggestions for improving the figures. We have incorporated these changes in Figures 2, 3 and along with their figure supplements. Graphs in Figure 4D-G have been edited to make the lines more visible to the reader.

      Reviewer #3 (Public Review)

      However, a mouse weighs between 20 and 40 g, so that an implant of 4.5 g is still quite considerable. It can be expected that this has an impact on the behavior and, possibly, the well-being of the animals. Whether this is the case or not, is not really addressed in this study.

      The weight of the E-Scope (4.5 g) is near the maximum that is tolerated by animals in our experience. We therefore acclimated the mouse to the weight with dummy scopes of increasing weights over a 7-10 day period. During this period, we observed the animal to have normal exploratory behavior. Specifically, there is no change in the sociability of the animals (Figure 2A) and animals cover the large arena (48x 48 cm, Figure 2H).

      Overall, the description of animal behavior is rather sparse. The methods state only that stranger age-matched mice were used, but do not state their gender. The nature of the social interactions was not described? Was their aggressive behavior, sexual approach and/or intercourse? Did the stranger mice attack/damage the E-Scope? Were the interactions comparable (using which parameters?) with and without E-Scope attached? It is not even described what the authors define as an "interaction bout" (Figure 2A). The number of interaction bouts is counted per 7 minutes, I presume? This is not specified explicitly.

      As mentioned in the methods section of the original version of our manuscript, all the target mice were age-matched “male” mice. As per the reviewer’s suggestion, we now have added in the manuscript that before any of our social interaction behavioral experiments, aggressive or agitated mice were removed after assessing their behavior in the arena during habituation. For all trials, all mice were introduced for the first time.

      We also mention in the methods section of our manuscript, that social behaviors were evaluated by proximity between the subject mouse and novel target mouse (2 cm from the body, head, or base of tail). From our recordings, we did not observe any aggressive, mounting, nor any other dominance behavior over the E-Scope subject mouse during the 7 minutes of social interaction assessment. Social interaction bouts in Figure 2A show the average number of social interaction bouts during the recording time. This has now been expanded upon in our revised manuscript.

      It would be very insightful if the authors would describe which events they considered to be action potentials, and which not. Similarly, the raw traces of Figure 1E are declared to be single-unit recordings of Purkinje cells. Partially due to the small size of the traces (invisible in print and pixelated in the digital version), I have a hard time recognizing complex spikes and simple spikes in these traces. This is a bit worrisome, as the authors declare the typical duration of the pause in simple spike firing after a complex spike to be 20-100 ms. In my experience, such long pauses are rare in this region, and definitely not typical. In the right panel of Figure 1A, an example of a complex spike-induced pause is shown. This pause is around 15 ms, so not typical according to the text, and starts only around 4 ms after the complex spike, which should not be the case and suggests either a misalignment of the figure or the detection of complex spike spikelets as simple spikes, while the abnormally long pause suggests that the authors fail to detect a lot of simple spikes. The authors could provide more confidence in their data by including more raw data, making explicit how they analyzed the signals, and by reporting basic statistics of firing properties (like rate, cv or cv2, pause duration). In this respect, Figure 2 - figure supplement 3 shows quite a large percentage of cells to have either a very low or a very high firing rate.

      We now provide a better example of simple spikes and complex spikes in Fig 1E and corrected our comment in the body of the manuscript. Previous version of the SS x CS cross-correlation histogram in Figure 1G as the reviewer mentions, was not the best example, because of the detected CS spikelets. However, the detection of CS spikelets has little impact on the interpretation of the results. We have replaced this figure with a better example of the SS x CS cross-correlation histogram.

      The number of Purkinje cells recorded during social interactions is quite low: only 11 cells showed a modulation in their spiking activity (unclear whether in complex spikes, simple spikes or both. During object interaction, only 4 cells showed a significant modulation. Unclear is whether the latter 4 are a subset of the former 11, or whether "social cells" and "object cells" are different categories. Having so few cells, and with these having different types of modulation, the group of cells for each type of modulation is really small, going down to 2 cells/group. It is doubtful whether meaningful interpretation is possible here.

      While the number of neurons is not as high as those reported for other regions, the number presented depicts the full range of responses to social behavior. It is extremely difficult to obtain stable neurons in freely behaving socially interacting animals and only a handful of neurons could be recorded in each animal. Among these recorded neurons only a subset responds to social interactions further reducing the numbers. The results however are consistent among cell types and the direction of modulation fits with the inhibitory connectivity between PCs and DN neurons. To our knowledge, we are the first group to publish neuronal activity of PC and DN neurons from freely behaving mice during social behavior.

      Neural activity patterns observed during social interaction do not necessarily relate specifically to social interaction, but can also occur in a non-social context. The authors control this by comparing social interactions with object interactions, but I miss a direct comparison between the two conditions, both in terms of behavior (now only the number of interactions is counted, not their duration or intensity), and in terms of neural activity. There is some analysis done on the interaction between movement and cerebellar activity (Figure 2 - figure supplement 4), but it is unclear to what extent social interactions and movements are separated here. It would already help to indicate in the plots with trajectories (e.g., Fig. 2H) indicate the social interactions (e.g., social interaction-related movements in red, the rest of the trajectories in black).

      We have updated the social interaction plots in Figure 2H in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Increase the number of cerebellar neurons that are recorded.

      Due to the difficulty of the experiment and the low yield which we get for cerebellar recordings, substantially increasing the number of neurons will require many more experiments which are not feasible at this time.

      Include more raw data and make the analysis procedure more insightful with illustrations of intermediate steps.

      We have included a more thorough description of the analysis in the methods section of the revised manuscript.

      Provide a better description of the behavior.

      We have increased the level of detail regarding the mouse behavior in the Results and Methods sections. This includes a more detailed description of the parameters we used to analyze the social interaction.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Thank you for the helpful comments. Below, we have quoted the relevant sections from the revised manuscript as we respond to the reviewer’s comments item-by-item.

      Weaknesses:

      While the task design in this study is intentionally stimulus-rich and places a minimal constraint on the animal to preserve naturalistic behavior, this is, unfortunately, a double-edged sword, as it also introduces additional variables that confound some of the neural analysis. Because of this, a general weakness of the study is a lack of clear interpretability of the task variable neural correlates. This is a limitation of the task, which includes many naturally correlated variables - however, I think with some additional analyses, the authors could strengthen some of their core arguments and significantly improve clarity.

      We acknowledge the weakness and have included additional analyses to compensate for it. The details are as follows in our reply to the subsequent comments.  

      For example, the authors argue, based on an ANN decoding analysis (Figure 2b), that PFC neurons encode spatial information - but the spatial coordinate that they decode (the distance to the active foraging zone) is itself confounded by the fact that animals exhibit different behavior in different sections of the arena. From the way the data are presented, it is difficult to tell whether the decoder performance reflects a true neural correlate of distance, or whether it is driven by behavior-associated activity that is evoked by different behaviors in different parts of the arena. The author's claim that PFC neurons encode spatial information could be substantiated with a more careful analysis of single-neuron responses to supplement the decoder analysis. For example, 1) They could show examples of single neurons that are active at some constant distance away from the foraging site, regardless of animal behavior, and 2) They could quantify how many neurons are significantly spatially modulated, controlling for correlates of behavior events. One possible approach to disambiguate this confound could be to use regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both.

      First of all, we would like to point out that while the recording was made during naturalistic foraging with minimal constraints behaviorally, a well-trained rat displayed an almost fixed sequence of actions within each zone. The behavioral repertoire performed in each zone was very different from each other: exploratory behaviors in the N-zone, navigating back and forth in the F-zone, and licking sucrose while avoiding attacks in the E-zone. Therefore, the entire arena is not only divided by the geographical features but also by the distinct set of behaviors performed in each zone. This is evident in the data showing a higher decoding accuracy of spatial distance in the F-zone than in the N- or E-zone. In this sense, the heterogeneous encoding reflects heterogenous distribution of dominant behaviors (navigation in the F-zone and attack avoidance while foraging in the E-zone) and hence corroborate the reviewer’s comment at a macroscopic scale encompassing the entire arena.

      Having said that, the more critical question is whether the neural activity is more correlated with microscopic behaviors at every moment rather than the location decoded in the F-zone. As the reviewer suggested, the first-step is to analyze single-neuron activity to identify whether direct neural correlates of location exist. To this end, traditional place maps were constructed for individual neurons. Most neurons did not show cohesive place fields across different regions, indicating little-to-no direct place coding by individual neurons. Only a few neurons displayed recognizable place fields in a consistent manner. However, even these place fields were irregular and patchy, and therefore, nothing comparable to the place cells or grid cells found in the hippocampus or entorhinal cortex. Some examples firing maps have been added to Figure 2 and characterized in the text as below.

      “To determine whether location-specific neural activity exists at the single-cell level in our mPFC data, a traditional place map was constructed for individual neurons. Although most neurons did not show cohesive place fields across different regions in the arena, a few neurons modulated their firing rates based on the rat’s current location. However, even these neurons were not comparable to place cells in the hippocampus (O’Keefe & Dostrovsky, 1971) or grid cells in the entorhinal cortex (Hafting et al., 2005) as the place fields were patchy and irregular in some cases (Figure 2B; Units 66 and 125) or too large, spanning the entire zone rather than a discrete location within it (Units 26 and 56). The latter type of neuron has been identified in other studies (e.g., Kaefer et al., 2020).”

      Next, to verify whether the location decoding reflects neuronal activity due to external features or particular type of action, predicted location was compared between the opposite directions within the F-zone, inbound and outbound in reference to the goal area (Lobsterbot). If the encoding were specifically tied to a particular action or environmental stimuli, there should be a discrepancy when the ANN decoder trained with outbound trajectory is tested for predictions on the inbound path, and vice versa. However, the results showed no significant difference between the two trajectories, suggesting that the decoded distance was not simply reflecting neural responses to location-specific activities or environmental cues during navigation.

      “To determine whether the accuracy of the regressor varied depending on the direction of movement, we compared the decoding accuracy of the regressor for outbound (from the N- to E-zone) vs. inbound (from the E- to N- zone) navigation within the F-zone. There was no significant difference in decoding accuracy between outbound vs. inbound trips (paired t-test; t(39) = 1.52, p =.136), indicating that the stability of spatial encoding was maintained regardless of the moving direction or perceived context (Figure 2E).”

      Additionally, we applied the same regression analysis on a subset of data that were recorded while the door to the robot compartment was closed during the Lobsterbot sessions. This way, it is possible to test the decoding accuracy when the most salient spatial feature, the Lobsterbot, is blocked out of sight. The subset represents an average of 38.92% of the entire session. Interestingly, the decoding accuracy with the subset of data was higher accuracy than that with the entire dataset, indicating that the neural activities were not driven by a single salient landmark. This finding supports our conclusion that the location information can be decoded from a population of neurons rather than from individual neurons that are associated with environmental or proprioceptive cues. We have added the following description of results in the manuscript.

      “Previous analyses indicated that the distance regressor performed robustly regardless of movement direction, but there is a possibility that the decoder detects visual cues or behaviors specific to the E-zone. For example, neural activity related to Lobsterbot confrontation or licking behavior might be used by the regressor to decode distance. To rule out this possibility, we analyzed a subset of data collected when the compartment door was closed, preventing visual access to the Lobsterbot and sucrose port and limiting active foraging behavior. The regressor trained on this subset still decoded distance with a MAE of 12.14 (± 3.046) cm (paired t-test; t(39) = 12.17, p <.001). Notably, the regressor's performance was significantly higher with this subset than with the full dataset (paired t-test; t(39) = 9.895, p <.001).”

      As for the comment on “using regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both”, it is difficult to separate a particular behavioral event let alone timestamping it since the rat’s location was being monitored in the constantly-moving, naturalistic stream of behaviors. However, as mentioned above, a new section entitled “Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision” argues against single-neuron based account by performing the feature importance analysis. The results showed that even when the top 20% of the most informative neurons were excluded, the remaining neural population could still decode both distance and events.  This analysis supports the idea of a population-wide mode shift rather than distinct subgroups of neurons specialized in processing different sensory or motor events. This idea is also expressed in the schematic diagrams featured in Figure 8 of the revision.

      To substantiate the claim that PFC neurons really switch between different coding "modes," the authors could include a version of this analysis where they have regressed out, or otherwise controlled for, these confounds. Otherwise, the claim that the authors have identified "distinctively different states of ensemble activity," as opposed to simple coding of salient task features, seems premature.

      A key argument in our study is that the mPFC neurons encode different abstract internal representations (distance and avoidance decision) at the level of population. This has been emphasized in the revision with additional analyses and discussions. Most of all, we performed single neuron-based analysis for both spatial encoding (place fields for individual neurons) and avoidance decision (PETHs for head entry and head withdrawal) and contrasted the results with the population analysis. Although some individual neurons displayed a fractured “place cell-like” activity, and some others showed modulated firing at the head-entry and the head-withdrawal events, the ensemble decoding extracted distance information for the current location of the animal at a much higher accuracy. Furthermore, the PCA analysis identified abstract feature dimensions especially regarding the activity in the E-zone that cannot be attributable to a small number of sensory- or motor-related neurons. 

      To mitigate the possibility that the PCA is driven primarily by a small subset of units responsive to salient behavioral events, we also applied PCA to the dataset excluding the activity in the 2-second time window surrounding the head entry and withdrawal. While this approach does not eliminate all cue- or behavior-related activity within the E-zone, it does remove the neural activity associated with emotionally significant events, such as entry into the E-zone, the first drop of sucrose, head withdrawal, and the attack. Even without these events, the PC identified in the E-zone was still separated from those in the F-zone and N-zone. This result again argues in support of distinct states of ensemble activity formed in accordance with different categories of behaviors performed in different zones. Finally, the Naïve Bayesian classifier trained with ensemble activity in the E-zone was able to predict the success and failure of avoidance that occur a few seconds later, indicating that the same population of neurons are encoding the avoidance decision rather than the location of the animal.

      Reviewer 1 (Recommendations):

      The authors include an analysis (Figure 4) of population responses using PCA on session-wide data, which they use to support the claim that PFC neurons encode distinctive neural states, particularly between the encounter zone and nesting/foraging zones. However, because the encounter zone contains unique stimulus and task events (sucrose, threat, etc.), and the samples for PCA are drawn from the entire dataset (including during these events), it seems likely that the Euclidean distance measures analyzed in Figure 4b are driven mostly by the neural correlates of these events rather than some more general change in "state" of PFC dynamics. This does not invalidate this analysis but renders it potentially redundant with the single neuron results shown in Figure 5 - and I think the interpretation of this as supporting a state transition in the coding scheme is somewhat misleading. The authors may consider performing a PCA/population vector analysis on the subset of timepoints that do not contain unique behavior events, rather than on session-wide data, or otherwise equalizing samples that correspond to behavioral events in different zones. Observing a difference in PC-projected population vectors drawn from samples that are not contaminated by unique encounter-related events would substantiate the idea that there is a general shift in neural activity that is more related to the change in context or goal state, and less directly to the distinguishing events themselves.

      Thank you for the comments. Indeed, this is a recurring theme where the reviewers expressed concerns and doubts about heterogenous encoding of different functional modes. Besides the systematic presentation of the results in the manuscript, from PETH to ANN and to Bayesian classifier, we argue, however, that the activity of the mPFC neurons is better represented by the population rather than loose collection of stimulus- or event-related neurons.

      The PCA results that we included as the evidence of distinct functional separation, might reflect activities driven by a small number of event-coding neurons in different zones. As mentioned in the public review, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. The critical times are defined as ± 1 second from these events and excluded from the neural data. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons encode abstract behavioral states (decision to avoid or stay) without the sensory- or motor-related activity. Although this analysis does not completely eliminate all possible confounding factors emerging in different external and internal contexts, it provides extra support for the population-level switch occurring in different zones.

      In Figure 7, the authors include a schematic that suggests that the number of neurons representing spatial information increases in the foraging zone, and that they overlap substantially with neurons representing behaviors in the encounter zone, such as withdrawal. They show in Figure 3 that location decoding is better in the foraging zone, but I could not find any explicit analysis of single-neuron correlates of spatial information as suggested in the schematic. Is there a formal analysis that lends support to this idea? It would be simple, and informative, to include a quantification of the fraction of spatial- and behavior-modulated neurons in each zone to see if changes in location coding are really driven by "larger" population representations. Also, the authors could quantify the overlap between spatial- and behavior-modulated neurons in the encounter zone to explicitly test whether neurons "switch" their coding scheme.

      The Figure 7 (now Figure 8) is now completely revised. The schematic diagram is modified to show spatial and avoidance decision encoding by the overlapping population of mPFC neurons (Figure 8a). Most notably, there are very few neurons that encode location but not the avoidance decision or vice versa. This is indicated by the differently colored units in F-zone vs. E-zone. The model also included units that are “not” engaged in any type of encoding or engaged in only one-type of encoding although they are not the majority.

      We have also added a schematic for hypothetical switching mechanisms (Figure 8b) to describe the conceptual scheme for the initiation of encoding-mode switching (sensory-driven vs. arbitrator-driven process)

      “Two main hypotheses could explain this switch. A bottom-up hypothesis suggests sensory inputs or upstream signals dictate encoding priorities, while a top-down hypothesis proposes that an internal or external “arbitrator” selects the encoding mode and coordinates the relevant information (Figure 8B). Although the current study is only a first step toward finding the regulatory mechanism behind this switch, our control experiment, where rats reverted to a simple shuttling task, provide evidence that might favor the top-down hypothesis. The absence of the Lobsterbot degraded spatial encoding rather than enhancing it, indicating that simply reducing the task demand is not sufficient to activate one particular type of encoding mode over another.  The arbitrator hypothesis asserts that the mPFC neurons are called on to encode heterogenous information when the task demand is high and requires behavioral coordination beyond automatic, stimulus-driven execution. Future studies incorporating multiple simultaneous tasks and carefully controlling contextual variables could help determine whether these functional shifts are governed by top-down processes involving specific neural arbitrators or by bottom-up signals.”

      Related to this difference in location coding throughout the environment, the authors suggest in Figure 3a-b that location coding is better in the foraging zone compared to the nest or encounter zones, evidenced by better decoder performance (smaller error) in the foraging zone (Figure 3b). The authors use the same proportion of data from the three zones for setting up training/test sets for cross-validation, but it seems likely that overall, there are substantially more samples from the foraging zone compared to the other two zones, as the animal traverses this section frequently, and whenever it moves from the next into the encounter zone (based on the video). What does the actual heatmap of animal location look like? And, if the data are down-sampled such that each section contributes the same proportion of samples to decoder training, does the error landscape still show better performance in the foraging zone? It is important to disambiguate the effects of uneven sampling from true biological differences in neural activity.

      Thank you for the comment. We agree with the concern regarding uneven data size from different sections of the arena. Indeed, as the heatmap below indicates, the rats spent most of their time in two critical locations, one being a transition area between N-and F-zone and the other near the sucrose port. This imbalance needs to be corrected. In fact we have included methodology to correct this biased sampling. In the result section “Non-navigational behavior reduces the accuracy of decoded location” we have the following results.

      Author response image 1.

      Heatmap of the animal’s position during one example session. (Left) Unprocessed occupancy plot. Each dot represents 0.2 seconds. Right) Smoothed occupancy plot using a Gaussian filter (sigma: 10 pixels, filter size: 1001 pixels). The white line indicates a 10 cm length.

      “To correct for the unequal distribution of location visits (more visits to the F- than to other zones), the regressor was trained using a subset of the original data, which was equalized for the data size per distance range (see Materials and Methods). Despite the correction, there was a significant main effect of the zone (F(1.16, 45.43) = 119.2, p <.001) and the post hoc results showed that the MAEs in the N-zone (19.52 ± 4.46 cm; t(39) = 10.45; p <.001) and the E-zone (26.13 ± 7.57 cm; t(39) = 11.40; p <.001) had a significantly higher errors when compared to the F-zone (14.10 ± 1.64 cm).”

      Also in the method section, we have stated that:

      “In the dataset adjusted for uneven location visits, we divided distance values into five equally sized bins. Then, a sub-dataset was created that contains an equal number of data points for each of these bins.”

      Why do the authors choose to use a multi-layer neural network (Figure 2b-c) to decode the animal's distance to the encounter zone?(…) The authors may consider also showing an analysis using simple regression, or maybe something like an SVM, in addition to the ANN approach.

      We began with a simple linear regression model and progressed to more advanced methods, including SVM and multi-layer neural networks. As shown below, simpler methods could decode distance to some extent, but neural networks and random forest regressors outperformed others (Neural Network: 16.61 cm ± 3.673; Linear Regression: 19.85 cm ± 2.528; Quadratic Regression: 18.68 cm ± 4.674; SVM: 18.88 cm ± 2.676; Random Forest: 13.59 cm ± 3.174).

      We chose the neural network model for two main reasons: (1) previous studies demonstrated its superior performance compared to Bayesian regressors commonly used for decoding neural ensembles, and (2) its generalizability and robustness against noisy data. Although the random forest regressor achieved the lowest decoding error, we avoided using it due to its tendency to overfit and its limited generalization to unseen data.

      Overall, we expect similar results with other regressors but with different statistical power for decoding accuracy. Instead, we speculate that neural network’s use of multiple nodes contributes to robustness against noise from single-unit recordings and enables the network to capture distributed processing within neural ensembles.

      In Figure 6c, the authors show a prediction of withdrawal behavior based on neural activity seconds before the behavior occurs. This is potentially very interesting, as it suggests that something about the state of neural dynamics in PFC is potentially related to the propensity to withdraw, or to the preparation of this behavior. However, another possibility is that the behaves differently, in more subtle ways, while it is anticipating threat and preparing withdrawal behavior - since PFC neurons are correlated with behavior, this could explain decoder performance before the withdrawal behavior occurs. To rule out this possibility, it would be useful to analyze how well, and how early, withdrawal success can be decoded only on the basis of behavioral features from the video, and then to compare this with the time course of the neural decoder. Another approach might be to decode the behavior on the basis of video data as well as neural data, and using a model comparison, measure whether inclusion of neural features significantly increases decoder performance.

      We appreciate this important point, as mPFC activity might indeed reflect motor preparation preceding withdrawal behavior. Another reviewer raised a similar concern regarding potential micro-behavioral influences on mPFC activity prior to withdrawal responses. However, our behavioral analysis suggests that highly trained rats engage in sucrose licking which has little variability regardless of the subsequent behavioral decision. To support, 95% of inter-lick intervals were less than 0.25 seconds, which is not enough time to perform any additional behavior during encounters.

      Author response image 2.

      To further clarify this, we included additional video showing both avoidance and escape withdrawals at close range. This video was recorded during the development of the behavioral paradigm, though we did not routinely collect this view, as animals consistently exhibited stable licking behavior in the E-zone. As demonstrated in the video, the rat remains highly focused on the lick port with minimal body movement during encounters. Therefore, we believe that the neural ensemble dynamics observed in the mPFC are unlikely to be driven by micro-behavioral changes.

      Reviewer 2 (Public Review):

      Thank you for the positive comment on our behavior paradigm and constructive suggestions on additional analysis. We came to think that the role of mPFC could be better portrayed as representing and switching between different encoding targets under different contexts, which in part, was more clearly manifested by the naturalistic behavioral paradigm. In the revision we tried to convey this message more explicitly and provide a new perspective for this important aspect of mPFC function.

      It is not clear what proportion of each of the ensembles recorded is necessary for decoding distance from the threat, and whether it is these same neurons that directly 'switch' to responding to head entry or withdrawal in the encounter phase within the total population. The PCA gets closest to answering this question by demonstrating that activity during the encounter is different from activity in the nesting or foraging zones, but in principle this could be achieved by neurons or ensembles that did not encode spatial parameters. The population analyses are focused on neurons sensitive to behaviours relating to the threat encounter, but even before dividing into subtypes etc., this is at most half of the recorded population.

      In our study, the key idea we aim to convey is that mPFC neurons adapt their encoding schemes based on the context or functional needs of the ongoing task. Other reviewers also suggested strengthening the evidence that the same neurons directly switch between encoding two different tasks. The counteracting hypothesis to "switching functions within the same neurons" posits that there are dedicated subsets of neurons that modulate behavior—either by driving decisions/behaviors themselves or being driven by computations from other brain regions.

      To test this idea, we included an additional analysis chapter in the results section titled Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision. In this section, we directly tested this hypothesis by examining each neuron's contribution to the distance regressor and the event classifier. The results showed that the histogram of feature importance—the contribution to each task—is highly skewed towards zero for both decoders, and removing neurons with high feature importance does not impair the decoder’s performance. These findings suggest that 1) there is no direct division among neurons involved in the two tasks, and 2) information about spatial/defensive behavior is distributed across neurons.

      Furthermore, we tested whether there is a negative correlation between the feature importance of spatial encoding and avoidance encoding. Even if there were no “key neurons” that transmit a significant amount of information about either spatial or defensive behavior, it is still possible that neurons with higher information in the navigation context might carry less information in the active-foraging context, or vice versa. However, we did not observe such a trend, suggesting that mPFC neurons do not exhibit a preference for encoding one type of information over the other.

      Lastly, another reviewer raised the concern that the PCA results, which we used as evidence of functional separation of different ensemble functions, might be driven by a small number of event-coding neurons. To address this, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. In the Peri-Event Time Histogram (PETH) analysis, we observed that some neurons exhibit highly-modulated activity upon arrival at the E-zone (head entry; HE) and immediately following voluntary departure or attack (head withdrawal; HW). We defined 'critical event times' as ± one second from these events and excluded neural data from these periods to determine if PCA could still differentiate neural activities across zones. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons adapt their activity according to the context. We acknowledge that this analysis still cannot eliminate all of the confounding factors due to the context change, but we confirmed that excluding two significant events (delivery onset of sucrose and withdrawal movement) does not alter our result.

      To summarize, these additional results further support the conclusion that spatial and avoidance information is distributed across the neural population rather than being handled by distinct subsets. The analyses revealed no negative correlation between spatial and avoidance encoding, and excluding event-driven neural activity did not alter the observed functional separation, confirming that mPFC neurons dynamically adjust their activity to meet contextual demands.

      A second concern is also illustrated by Fig. 7: in the data presented, separate reward and threat encoding neurons were not shown - in the current study design, it is not possible to dissociate reward and threat responses as the data without the threat present were only used to study spatial encoding integrity.

      Thank you for this valuable feedback. Other reviewers have also noted that Figure 7 (now Figure 8) is misleading and contains assertions not supported by our experiments. In response, we have revised the model to more accurately reflect our findings. We have eliminated the distinction between reward coding and threat coding neurons, simplifying it to focus on spatial encoding and avoidance encoding neurons. The updated figure will more appropriately align with our findings and claims. A. Distinct functional states (spatial vs. avoidance decision) encoded by the same population neurons are separable by the region (F- vs. E zone). B. Hypothetical control models by which mPFC neurons assume different functional states.

      Thirdly, the findings of this work are not mechanistic or functional but are purely correlational. For example, it is claimed that analyzing activity around the withdrawal period allows for ascertaining their functional contributions to decisions. But without a direct manipulation of this activity, it is difficult to make such a claim. The authors later discuss whether the elevated response of Type 2 neurons might simply represent fear or anxiety motivation or threat level, or whether they directly contribute to the decision-making process. As is implicit in the discussion, the current study cannot differentiate between these possibilities. However, the language used throughout does not reflect this. 

      We acknowledge that our experiments only involve correlational study and this serves as weakness. Although we carefully managed to select word to not to be deterministic, we agree that some of the language might mislead readers as if we found direct functional contribution. Thus, we changed expressions as below.

      “We then further analyzed the (functional contribution ->)correlation between neural activity and success and failure of avoidance behavior. If the mPFC neurons (encode ->)participate in the avoidance decisions, avoidance withdrawal (AW; withdrawal before the attack) and escape withdrawal (EW; withdrawal after the attack) may be distinguishable from decoded population activity even prior to motor execution.”

      Also, we added part below in discussion section to clarify the limitations of the study.

      “Despite this interesting conjecture, any analysis based on recording data is only correlational, mandating further studies with direct manipulation of the subpopulation to confirm its functional specificity.”

      Fourthly, the authors mention the representation of different functions in 'distinct spatiotemporal regions' but the bulk of the analyses, particularly in terms of response to the threat, do not compare recordings from PL and IL although - as the authors mention in the introduction - there is prior evidence of functional separation between these regions.

      Thank you for bringing this part to our attention. As we mentioned in the introduction, we acknowledge the functional differences between the PL and IL regions. Although differences in spatial encoding between these two areas were not deeply explored, we anticipated finding differences in event encoding, given the distinct roles of the PL and IL in fear and threat processing. However, our initial analysis revealed no significant differences in event encoding between the regions, and as a result, we did not emphasize these differences in the manuscript. To address this point, we have reanalyzed the data separately and included the following findings in the manuscript.

      “However, we did not observe a difference in decoding accuracy between the PL and IL ensembles, and there were no significant interactions between regressor type (shuffled vs. original) and regions (mixed-effects model; regions: p=.996; interaction: p=.782). These results indicate that the population activity in both the PL and IL contains spatial information (Figure 2D, Video 3).

      […]

      Furthermore, we analyzed whether there is a difference in prediction accuracy between sessions with different recorded regions, the PL and the IL. A repeated two-way ANOVA revealed no significant difference between recorded regions, nor any interaction (regions: F(1, 38) = 0.1828, p = 0.671; interaction: F(1, 38) = 0.1614, p = 0.690).

      […]

      We also examined whether there is a significant difference between the PL and IL in the proportion of Type 1 and Type 2 neurons. In the PL, among 379 recorded units, 143 units (37.73%) were labeled as Type 1, and 75 units (19.79%) were labeled as Type 2. In contrast, in the IL, 156 units (61.66%) and 19 units (7.51%) of 253 recorded units were labeled as Type 1 and Type 2, respectively. A Chi-square analysis revealed that the PL contains a significantly higher proportion of Type 2 neurons (χ²(1, 632) = 34.85, p < .001), while the IL contains a significantly higher proportion of Type 1 neurons compared to the other region (χ²(1, 632) = 18.07, p < .001).”

      To summarize our additional results, we did not observe performance differences in distance decoding or event decoding. The only difference we observed was the proportional variation of Type 1 and Type 2 neurons when we separated the analysis by brain region. These results are somewhat counterintuitive, considering the distinct roles of the two regions—particularly the PL in fear expression and the IL in extinction learning. However, since the studies mentioned in the introduction primarily used lesion and infusion methods, this discrepancy may be due to the different approach taken in this study. Considering this, we have added the following section to the discussion.

      “Interestingly, we found no difference between the PL and IL in the decoding accuracy of distance or avoidance decision. This somewhat surprising considering distinct roles of these regions in the long line of fear conditioning and extinction studies, where the PL has been linked to fear expression and the IL to fear extinction learning (Burgos-Robles et al., 2009; Dejean et al., 2016; Kim et al., 2013; Quirk et al., 2006; Sierra-Mercado et al., 2011; Vidal-Gonzalez et al., 2006). On the other hand, more Type 2 neurons were found in the PL and more Type 1 neurons were found in the IL. To recap, typical Type 1 neurons increased the activity briefly after the head entry and then remained inhibited, while Type 2 neurons showed a burst of activity during head entry and sustained increased activity. One study employing context-dependent fear discrimination task (Kim et al., 2013) also identified two distinct types of PL units: short-latency CS-responsive units, which increased firing during the initial 150 ms of tone presentation, and persistently firing units, which maintained firing for up to 30 seconds. Given the temporal dynamics of Type 2 neurons, it is possible that our unsupervised clustering method may have merged the two types of neurons found in Kim et al.’s study.

      While we did not observe decreased IL activity during dynamic foraging, prior studies have shown that IL excitability decreases after fear conditioning (Santini et al., 2008), and increased IL activity is necessary for fear extinction learning. In our paradigm, extinction learning was unlikely, as the threat persisted throughout the experiment. Future studies with direct manipulation of these subpopulations, particularly examining head withdrawal timing after such interventions, could provide insight into how these subpopulations guide behavior.”

      Additionally, we made some changes in the introduction, mainly replacing the PL/IL with mPFC to be consistent with the main body of results and conclusion and also specifying the correlational nature of the recording study.

      “Machine learning-based populational decoding methods, alongside single-cell analyses, were employed to investigate the correlations between neuronal activity and a range of behavioral indices across different sections within the foraging arena.”

      Reviewer 2 (Recommendations):

      The authors consistently use parametric statistical tests throughout the manuscript. Can they please provide evidence that they have checked whether the data are normally distributed? Otherwise, non-parametric alternatives are more appropriate.

      Thank you for mentioning this important issue in the analysis. We re-ran the test of normality for all our data using the Shapiro-Wilk test with a p-value of .05 and found that the following data sets require non-parametric tests, as summarized in Author response table 1 below. For those analyses which did not pass the normality test, we used a non-parametric alternative test instead. We also updated the methods section. For instance, repeated measures ANOVA for supplementary figure S1 and PCA results were changed to the Friedman test with Dunn’s multiple comparison test.

      Author response table 1.

      Line 107: it is not clear here or in the methods whether a single drop of sucrose solution is delivered per lick or at some rate during the encounter, both during the habituation or in the final task. This is important information in order to understand how animals might make decisions about whether to stay or leave and how to interpret neural responses during this time period. Or is it a large drop, such that it takes multiple licks to consume? Please clarify.

      The apparatus we used incorporated an IR-beam sensor-controlled solenoid valve. As the beam sensor was located right in front of the pipe, the rat’s tongue activated the sensor. As a result, each lick opened the valve for a brief period, releasing a small amount of liquid, and the rat had to continuously lick to gain access to the sucrose. We carefully regulated the flow of the liquid and installed a small sink connected to a vacuum pump, so any remaining sucrose not consumed by the rat was instantly removed from the port. We clarified how sucrose was delivered in the methods section and also in the results section.

      Method:

      “The sucrose port has an IR sensor which was activated by a single lick. The rat usually stays in front of the lick port and continuously lick up to a rate of 6.3 times per second to obtain sucrose. Any sucrose droplets dropped in the bottom sink were immediately removed by negative pressure so that the rat’s behavior was focused on the licking.”

      Result:

      “The lick port was activated by an IR-beam sensor, triggering the solenoid valve when the beam was interrupted. The rat gradually learned to obtain rewards by continuously licking the port.”

      However, I'm not sure I understand the authors' logic in the interpretation: does the S-phase not also consist of goal-directed behaviour? To me, the core difference is that one is mediated by threat and the other by reward. In addition, it would be helpful to visualize the behaviour in the S-phase, particularly the number of approaches. This difference in the amount of 'experience' so to speak might drive some of the decrease in spatial decoding accuracy, even if travel distance is similar (it is also not clear how travel distance is calculated - is this total distance?) Ideally, this would also be included as a predictor in the GLM.

      We agree that the behaviors observed during the shuttling phase can also be considered goal-directed, as the rat moves purposefully toward explicit goals (the sucrose port and the N-zone during the return trip). However, we argue that there is a significant difference in the level of complexity of these goals.

      During the L-phase, the rat not only has to successfully navigate to the E-zone for sucrose but also pay attention to the robots, either to avoid an attack from the robot's forehead or escape the fast-striking motion of the claw. When the rat runs toward the E-zone, it typically takes a side-approaching path, similar to Kim and Choi (2018), and exhibits defensive behaviors such as a stretched posture, which were not observed in the S-phase. This behavioral characteristic differs from the S-phase, where the rat adopted a highly stereotyped navigation pattern fairly quickly (within 3 sessions), evidenced by more than 50 shuttling trajectories per session. In this phase, the rat exhibited more stimulus-response behavior, simply repeating the same actions over time without deliberate optimization.

      In our additional experiment with two different levels of goal complexity (reward-only vs. reward/threat conflict), we used a between-subject design in which both groups experienced both the S-phase and L-phase before surgery and underwent only one type of session afterward. This approach ruled out the possibility of differences in contextual experience. Additionally, since we initially designed the S-phase as extended training, behaviors in the apparatus tended to stabilize after rats completed both the S-phase and L-phase before surgery. As a result, we compared the post-surgery Lobsterbot phase to the post-surgery shuttling phase to investigate how different levels of goal complexity shape spatial encoding strength.

      To clarify our claim, we edited the paragraph below.

      “This absence of spatial correlates may result from a lack of complex goal-oriented navigation behavior, which requires deliberate planning to acquire more rewards and avoid potential threats.

      […]

      After the surgery, unlike the Lob-Exp group, the Ctrl-Exp group returned to the shuttling phase, during which the Lobsterbot was removed. With this protocol, both groups experienced sessions with the Lobsterbot, but the Ctrl-Exp group's task became less complex, as it was reduced to mere reward collection.

      . Given these observations, along with the mPFC’s lack of consistency in spatial encoding, it is plausible that the mPFC operates in multiple functional modes, and the spatial encoding mode is preempted when the complexity of the task requires deliberate spatial navigation.”

      Additionally, we added behavior data during initial S-phase into Supplementary Figure 1.

      It is good point that the amount of experience might drive decrease in spatial decoding accuracy. To test this hypothesis, we added a new variable, the number of Lobsterbot sessions after surgery, to the previous GLM analysis. The updated model predicted the outcome variable with significant accuracy (F(4,44) = 10.31, p < .001), and with the R-squared value at 0.4838. The regression coefficients were as follows: presence of the Lobsterbot (2.76, standard error [SE] = 1.11, t = 2.42, p = .020), number of recorded cells (-0.43, SE = .08, t = -5.22, p < .001), recording location (0.90, SE = 1.11, p = .424), and number of L sessions (0.002, SE = 0.11, p = .981). These results indicate that the number of exposures to the Lobsterbot sessions, as a measure of experience, did not affect spatial decoding accuracy.

      For minor edit, we edited the term as “total travel distance”.

      Relating to the previous point, it should be emphasized in both sections on removing the Lobsterbot and on non-navigational behaviours that the spatial decoding is all in reference to distance from the threat (or reward location). The language in these sections differs from the previous section where 'distance from the goal' is mentioned. If the authors wish to discuss spatial decoding per se, it would be helpful to perform the same analysis but relative to the animals' own location which might have equal accuracy across locations in the arena. Otherwise, it is worth altering the language in e.g. line 258 onwards to state the fact that distance to the goal is only decodable when animals are actively engaged in the task.

      Thank you for this comment, we changed the term as “distance from the conflict zone” or “distance of the rat to the center of the E-zone” to clarify our experiment setup.

      In Fig. 5, why is the number of neurons shown in the PETHs less than the numbers shown in the pie charts?

      The difference in the number of neurons between the PETHs and the pie charts in Figure 5 is because PETHs are drawn only for 'event-responsive' units. For visualizing the neurons, we selectively included those that met certain criteria described in Method section (Behavior-responsive unit analysis). We have updated the caption for Figure 5 as follows to minimize confusion.

      “Multiple subpopulations in the mPFC react differently to head entry and head withdrawal.

      (A) Top: The PETH of head entry-responsive units is color-coded based on the Z-score of activity.

      (C) The PETH of head withdrawal-responsive units is color-coded based on the Z-score of activity.”

      I appreciate the amount of relatively unprocessed data plotted in Figure 5, but it would be great to visualize something similar for AW vs. EW responses within the HW2 population. In other words, what is there that's discernably different within these responses that results in the findings of Fig. 6?

      To visualize the difference in neural activity between AW and EW, we included an additional supplementary figure (Supplementary Figure 5). We divided the neurons into Type 1 and Type 2 and plotted PETH during Avoidance Withdrawal (AW) and Escape Withdrawal (EW). Consistent with the results shown in Figure 6d, we could visually observe increased activity in Type 2 neurons before the execution of AW compared to EW. However, we couldn’t find a similar pattern in Type 1 neurons.

      On a related note, it would add explanatory power if the authors were able to more tightly link the prediction accuracy of the ensemble (particularly the Type 2 neurons) to the timing of the behaviour. Earlier in the manuscript it would be helpful to show latency to withdraw in AW trials; are animals leaving many seconds before the attack happens, or are they just about anticipating the timing of the attack? And therefore when using ensemble activity to predict the success of the AW, is the degree to which this can be done in advance (as the authors say, up to 6 seconds before withdrawal) also related to how long the animal has been engaged with the threat?

      We agree that the timing of head withdrawal, particularly in AW trials, is a critical factor in describing the rat's strategy toward the task. To test whether the rat uses a precise timing strategy—for instance, leaving several seconds before the attack or exploiting the discrete 3- and 6-second attack durations—we plotted all head withdrawal timepoints during the 6-second trials. The distribution was more even, without distinguishable peaks (e.g., at the very initial period or at the 3- or 6-second mark). This indicates a lack of precise temporal strategy by the rat. We included additional data in the supplementary figure (Supplementary Figure 6) and added the following to the results section.

      “We monitored all head withdrawal timepoints to assess whether rats developed a temporal strategy to differentiate between the 3-second and 6-second attacks. We found no evidence of such a strategy, as the timings of premature head withdrawals during the 6-second attack trials were evenly distributed (see Supplementary Figure S1).”

      As depicted in the new supplementary figure, head withdrawal times during avoidance behavior vary from sub-seconds to the 3- or 6-second attack timepoints. After receiving the reviewer’s comment, we became curious whether there is a decoding accuracy difference depending on how long the animal engaged with the threat. We selected all 6-second attack and avoidance withdrawal trials and checked if correctly classified trials (AW trials classified as AW) had different head withdrawal times—perhaps shorter durations—compared to misclassified trials (AW trials classified as EW). As shown in Author response image 3 below, there was no significant difference between these two types, indicating that the latency of head withdrawal does not affect prediction accuracy.

      Author response image 3.

      Finally, there remain some open questions. One is how much encoding strength - of either space or the decision to leave during the encounter - relates to individual differences in animal performance or behaviour, particularly because this seems so variable at baseline. A second is how stable this encoding is. The authors mention that the distance encoding must be stable to an extent for their regressor to work; I am curious whether this stability is also found during the encounter coding, and also whether it is stable across experience. For example, in a session when an individual has a high proportion of anticipatory withdrawals, is the proportion of Type 2 neurons higher?

      Thank you for these questions. To recap the number of animals that we used, we used five rats during Lobsterbot experiments, and three rats for control experiment that we removed Lobsterbot after training. Indeed, there were individual differences in performance (i.e. avoidance success rate), number of recorded units (related to the recording quality), and baseline behaviors. To clarify these differences, see author response image 4 below.

      Author response image 4.

      We used a GLM to measure how much of the decoder’s accuracy was explained by individual differences. The result showed that 38.96% of distance regressor’s performance, and 12.14% of the event classifier’s performance was explained by the individual difference. Since recording quality was highly dependent on the animals, the high subject variability detected in the distance regression might be attributed to the number of recorded cells. Rat00 which had the lowest average mean absolute error had the highest number of recorded cells at average of 18. Compared to the distance regression, there was less subject variability in event classification. Indeed, the GLM results showed that the variability explained by the number of cells was only 0.62% in event classification.

      The reason we mentioned that "distance encoding must be stable for our regressor to work" is entirely based on the population-level analysis. Because we used neural data and behaviors from entire trials within a session, the regressor or classifier would have low accuracy if encoding dynamics changed within the session. In other words, if the way neurons encode avoidance/escape predictive patterns changed within a training set, the classifier would fail to generate an optimized separation function that works well across all datasets.

      To further investigate whether changes in experience affect event classification results over time, we plotted an additional graph below. Although there are individual and daily fluctuations in decoding accuracy, there was no observable trend throughout the experiments.

      Author response image 5.

      Regarding the correlation between the ratio of avoidance withdrawal and the proportion of Type 2 neurons, we were also curious and analyzed the data. Across 40 sessions, the correlation was -0.0716. For Type 1 neurons, it was slightly higher at 0.1459. We believe this indicates no significant relationship between the two variables.

      Minor points:

      I struggled with the overuse of acronyms in the paper. Some might be helpful but F-zone/N-zone, for example, or HE/HW, AW/EW are a bit of a struggle. After reading the paper a few times I learned them but a naive reader might need to often refer back to when they were first defined (as I frequently had to).

      To increase readability, we removed acronyms that are not often used and changed HE/HW to head-entry/head-withdrawal.

      I have a few questions about Figure 1F: in the text (line 150) it says that 'surgery was performed after three L sessions when the rats displayed a range of 30% to 60% AW'. This doesn't seem consistent with what is plotted, which shows greater variability in the proportion of AW behaviours both before and after surgery. It also appears that several rats only experienced two days of the L1 phase; please make clear if so. And finally, what is the line at 50% indicating? Neither the text nor the legend discuss any sort of thresholding at 50%. Instead, it would be best to make the distinction between pre- and post-surgery behaviour visually clearer.

      Thank you for pointing out this issue. We acknowledge there was an error in the text description. As noted in the Methods section, we proceeded with surgery after three Lobsterbot sessions. We have removed the incorrect part from the Results section and revised the Methods section for clarity.

      “After three days of Lobsterbot sessions, the rats underwent microdrive implant surgery, and recording data were collected from subsequent sessions, either Lobsterbot or shuttling sessions, depending on the experiment. For all post-surgery sessions, those with fewer than 20 approaches in 30 minutes were excluded from further analysis.”

      Among the five rats, Rat2 and Rat3 did not approach the robot during the entire Lob2 session, which is why these two rats do not have Lob2 data points. We updated the caption for regarding issue.

      Initially, we added a 50% reference line, but we agree it is unnecessary as we do not discuss this reference. We have updated the figure to include the surgery point, as shown in Supplementary Figure 1.

      Fig. 2C: each dot is an ensemble of simultaneously recorded neurons, i.e. a subset of the total 800-odd units if I understand correctly. How many ensembles does each rat contribute? Similarly, is this evenly distributed across PL and IL?

      Yes, each dot represents a single session, with a total of 40 sessions. Five rats contributed 11, 9, 8, 7, and 5 sessions, respectively. Although each rat initially had more than 10 sessions, we discarded some sessions with a low unit count (fewer than 10 sessions; as detailed in Materials and Methods - Data Collection). We collected 25 sessions from the PL and 15 sessions from the IL. Our goal was to collect more than 200 units per each region.

      Please show individual data points for Fig. 2D.

      We update the figure with individual data points.

      Is there a reason why the section on removing the Lobsterbot (lines 200 - 215) does not have associated MAE plots? Particularly the critical comparison between Lob-Exp and Ctl-Exp.

      We intentionally removed some graphs to create a more compact figure, but we appreciate your suggestion and have included the graph in Figure 2.

      Some references to supplementary materials are not working, e.g. line 333.

      Our submitted version of manuscript had reference error. For the current version, we used plane text, and the references are fixed.

      The legend for Supp. Fig. 2B is incorrect.

      We greatly appreciate this point. We changed the caption to match the figure.

      Reviewer 3 (Public Review):

      Thank you for recognizing our efforts in designing an ethologically relevant foraging task to uncover the multiple roles of the mPFC. While we acknowledge certain limitations in our methodology—particularly that we only observed correlations between neural activity and behavior without direct manipulation—we have conducted additional analyses to further strengthen our findings.

      Weakness:

      The primary concern with this study is the absence of direct evidence regarding the role of the mPFC in the foraging behavior of the rats. The ability to predict heterogeneous variables from the population activity of a specific brain area does not necessarily imply that this brain area is computing or using this information. In light of recent reports revealing the distributed nature of neural coding, conducting direct causal experiments would be essential to draw conclusions about the role of the mPFC in spatial encoding and/or threat evaluation. Alternatively, a comparison with the activity from a different brain region could provide valuable insights (or at the very least, a comparison between PL and IL within the mPFC).

      Thank you for the comment. Indeed, the fundamental limitation of the recording study is that it is only correlational, and any causal relationship between neural activity and behavioral indices is only speculative. We made it clearer in the revision and refrained from expressing any speculative ideas suggesting causality throughout the revision. While we did not provide direct evidence that the mPFC is computing or utilizing spatial/foraging information, we based our assertion on previous studies that have directly demonstrated the mPFC's role in complex decision-making tasks (Martin-Fernandez et al., 2023; Orsini et al., 2018; Zeeb et al., 2015) and in certain types of spatial tasks (De Bruin et al., 1994; Sapiurka et al., 2016) . We would like to emphasize that, to the best of our knowledge, there was no previous study which investigated the mPFC function while animal is solving multiple heterogenous problems in semi-naturalistic environment. Therefore, although our recording study only provides speculative causal inference, it certainly provides a foundation for investigating the mPFC function. Future study employing more sophisticated, cell-type specific manipulations would confirm the hypotheses from the current study.

      One of the key questions of this manuscript is how multiple pieces of information are represented in the recorded population of neurons. Most of the studies mentioned above use highly structured experimental designs, which allow researchers to study only one function of the mPFC. In the current study, the semi-naturalistic environment allows rats to freely switch between multiple behavioral sets, and our decoding analysis quantitatively assesses the extent to which spatial/foraging information is embedded during these sets. Our goal is to demonstrate that two different task hyperspaces are co-expressed in the same region and that the degree of this expression varies according to the rat’s current behavior (See Figure 8(b) in the revised manuscript).

      Alternatively, we added multiple analyses. First, we included a single unit-level analysis looking at the place cell-like property to contrast with the ensemble decoding. Most neurons did not show well-defined place fields although there were some indications for place cell-like property. For example, some neurons displayed fragmented place fields or unusually large place fields only at particular spots in the arena (mostly around the gates). The accuracy from this place information at the single-neuron level is much lower than that acquired from population decoding. Likewise, although there were neurons with modulated firing around the time of particular behavior (head entry and withdrawal), overall prediction accuracy of avoidance decision was much higher when the ensemble-based classifier was applied.

      Moreover, given that high-dimensional movement has been shown to be reflected in the neural activity across the entire dorsal cortex, more thorough comparisons between the neural encoding of task variables and movement would help rule out the possibility that the heterogeneous encoding observed in the mPFC is merely a reflection of the rats' movements in different behavioral modes.

      Thanks for the comment. We acknowledge that the neural activity may reflect various movement components across different zones in the arena. We performed several analyses to test this idea. First, we want to recap our run-and-stop event analysis may provide an insight regarding whether the mPFC neurons are encoding locations despite the significant motor events. The rats typically move across the F-zone fairly routinely and swiftly (as if they are “running”) to reach the E-zone at which they reduce the moving speed to almost a halt (“stopping”). The PETHs around these critical motor events, however, did not show any significant modulation of neural activity indicating that most neurons we recorded from mPFC did not respond to movement.

      We added this analysis to demonstrate that these sudden stops did not evoke the characteristic activation of Type 1 and Type 2 neurons observed during head entry into the E-zone. When we isolated these sudden stops outside the E-zone, we did not observe this neural signature (Supplementary Figure 2).

      Second, our PCA results showed that population activity in the E-zone during dynamic foraging behavior was distinct from the activity observed in the N- and F-zones during navigation. However, there is a possibility that the two behaviorally significant events—entry into the E-zone and voluntary or sudden exit—might be driving the differences observed in the PCA results. To account for this, we designated ±1 second from head entry and head withdrawal as "critical event times," excluded the corresponding neural data, and reanalyzed the data. This method removed neural activity associated with sudden movements in specific zones. Despite this exclusion, the PCA still revealed distinct population activity in the E-zone, different from the other zones (Supplementary Figure 4). This result reduces the likelihood that the observed heterogeneous neural activity is merely a reflection of zone-specific movements.

      Lastly, the main claim of the paper is that the mPFC population switches between different functional modes depending on the context. However, no dynamic analysis or switching model has been employed to directly support this hypothesis.

      Thank you for this comment. Since we did not conduct a manipulation experiment, there is a clear limitation in uncovering how switching occurs between the two task contexts. To make the most of our population recording data, we added an additional results section that examines how individual neurons contribute to both the distance regressor and the event classifier. Our findings support the idea that distance and dynamic foraging information are distributed across neurons, with no distinct subpopulations dedicated to each context. This suggests that mPFC neurons adjust their coding schemes based on the current task context, aligning with Duncan’s (2001) adaptive coding model, which posits that mPFC neurons adapt their coding to meet the task's current demands.

      Reviewer 3 (Recommendations):

      The evidence for spatial encoding is relatively weak. In the F-zone (50 x 48 cm), the average error was approximately 17 cm, constituting about a third of the box's width and likely not significantly smaller than the size of a rat's body. The errors in the shuffled data are also not substantially greater than those in the original data. An essential test indicates that spatial decoding accuracy decreases when the Losterbot is removed. However, assessing the validity of the results is difficult in the current state. There is no figure illustrating the results, and no statistics are provided regarding the test for matching the number of neurons.

      We acknowledge that the average error (~ 17 cm ) measured in our study is relatively large, even though the error is significantly smaller than that by the shuffled control model (22.6 cm). Previous studies reported smaller prediction errors but in different experimental conditions: 16 cm in Kaefer et al. (2020) and less than 10 cm in Ma et al. (2023) and Mashhoori et al. (2018). Most notably, the average number of units used in our study (15.8 units per session) is significantly smaller compared to the previous works, which used 63, 49, and 40 units, respectively. As our GLM results demonstrated, the number of recorded cells significantly influenced decoding accuracy (β = -0.43 cm/neuron). With a similar number of recorded cells, we would have achieved comparable decoding accuracy. In addition, unlike other studies that have employed a dedicated maze such as the virtual track or the 8-shaped maze, we exposed rats to a semi-naturalistic environment where they exhibited a variety of behaviors beyond simple navigation. As argued throughout the manuscript, we believe that the spatial information represented in the mPFC is susceptible to disruption when the animal engages in other activities. A similar phenomenon was reported by Mashhoori et al. (2018), where the decoder, which typically showed a median error of less than 10 cm, exhibited a much higher error—nearly 100 cm—near the feeder location.

      As for the reviewer’s request for comparing spatial decoding without the Lobsterbot, we added a new figure to illustrate the spatial decoding results, including statistical details. We also applied a Generalized Linear Model to regress out the effect of the number of recorded neurons and statistically assess the impact of Lobsterbot removal. This adjustment directly addresses the reviewer's request for a clearer presentation of the results and helps contextualize the decoding performance in relation to the number of recorded neurons.

      As indicated in the public review, drawing conclusions about the role of the mPFC in navigation and avoidance behavior during the foraging task is challenging due to the exclusively correlational nature of the results. The accuracy in AW/EW discrimination increases a few seconds before the response, implying that changes in mPFC activity precede the avoidance/escape response. However, one must question whether this truly reflects the case. Could this phenomenon be attributed to rats modifying their "micro-behavior" (as evidenced by changes in movement observed in the video) before executing the escape response, and subsequently influencing mPFC activity?

      We appreciate the reviewer's thoughtful observation regarding the correlational nature of our results and the potential influence of pre-escape micro-behaviors on mPFC activity. We acknowledge that the increased accuracy in AW/EW discrimination preceding the response could also be correlated with micro-behaviors. However, there is very little room for extraneous behavior other than licking the sucrose delivery port within the E-zone, as the rats are highly trained to perform this stereotypical behavior. To support this, we measured the time delays between licking events (inter-lick intervals). The results show a sharp distribution, with 95% of the intervals falling within a quarter second, indicating that the rats were stable in the E-zone, consistently licking without altering their posture.

      To complement the data presented in Author response image 2, a video clip showing a rat engaged in licking behavior was included. We carefully designed the robot compartment and adjusted the distance between the Lobsterbot and the sucrose port to ensure that rats could exhibit only limited behaviors inside the E-zone. The video confirms that no significant micro-behaviors were observed during the rat’s activity in the E-zone.

      If mPFC activity indeed switches mode, the results do not clearly indicate whether individual cells are specifically dedicated to spatial representation and avoidance or if they adapt their function based on the current goal. Figure 7, presented as a schematic illustration, suggests the latter option. However, the proportion of cells in the HE and HW categories that also encode spatial location has not been demonstrated. It has also not been shown how the switch is manifested at the level of the population.

      Thank you for this comment. As the reviewer pointed out, we suggest that mPFC neurons do not diverge based on their functions, but rather adapt their roles according to the current goal. To support this assertion, we added an additional results section that calculates the feature importance of decoders. This analysis allows us to quantitatively measure each neuron’s contribution to both the distance regressor and the event decoder. Our results indicate that distance and defensive behavior are not encoded by a small subset of neurons; instead, the information is distributed across the population. Shuffling the neural data of a single neuron resulted in a median increase in decoding error of 0.73 cm for the distance regressor and 0.01% for the event decoder, demonstrating that the decoders do not rely on a specific subset of neurons that exclusively encode spatial and/or defensive behavior

      Although we found supporting evidence that mPFC neurons encode two different types of information depending on the current context, we acknowledge that we could not go further in answering how this switch is manifested. One simple explanation is that the function is driven by current contextual information and goals—in other words, a bottom-up mechanism. However, in our control experiment, simplifying the navigation task worsened the encoding of spatial information in the mPFC. Therefore, we speculate that an external or internal arbitrator circuit determines what information to encode. A precise temporal analysis of the timepoint when the switch occurs in more controlled experiments might answer these questions. We have added this discussion to the discussion section.

      PL and IL are two distinct regions; however, there is no comparison between the two areas regarding their functional properties or the representations of the cells. Are the proportions of cell categories (HE vs HW or HE1 vs HE2, spatial encoding vs no spatial encoding) different in IL and PL? Are areas differentially active during the different behaviors?

      Thank you for bringing up this issue. As mentioned in our response to the public review, we included a comparison between the PL and IL regions. While we did not observe any differences in spatial encoding (feature importance scores), the only distinction was in the proportion of Type 1 and Type 2 neurons, as the reviewer suggested. We have incorporated our interpretation of these results into the discussion section.

      The results and interpretations of the cluster analysis appear to be highly dependent on the parameters used to define a cluster. For example, the HE2 category includes cells with activity that precedes events and gradually decreases afterward, as well as cells with activity that only follows the events.

      We strongly agree that dependency on hyperparameters is a crucial point when using unsupervised clustering methods. To eliminate any subjective criteria in defining clusters, we carefully selected our clustering approach, which requires only two hyperparameters: the number of initial clusters (set to 8) and the minimum number of cells required to be considered a valid cluster (cutoff limit, 50). The rationale behind these choices was: 1) a higher number of initial clusters would fail to generalize neural activity, 2) clusters with fewer than 50 neurons would be difficult to analyze, and 3) to prevent the separation of clusters that show noisy responses to the event.

      Author response table 2 shows the differences in the number of cell clusters when we varied these two parameters. As demonstrated, changing these two variables does result in different numbers of clusters. However, when we plotted each cluster type’s activity around head entry (HE) and head withdrawal (HW), an increased number of clusters resulted in the addition of small subsets with low variation in activity around the event, without affecting the general activity patterns of the major clusters.

      The example mentioned by the reviewer—possible separation of HE2—appears when using a hyperparameter set those results in 4 clusters, not 3. In this result, 83 units, which were labeled as HE2 in the 3-cluster hyperparameter set, form a new group, HE3 (Group 3). This group of units shows increased activity after head entry and exhibited characteristics similar to HE2, with most of the units classified as HW2, maintaining high activity until head withdrawal. Among the 83 HE3 units, 36 were further classified as HW2, 44 as non-significant, and 3 as HW1. Therefore, we believe this does not affect our analysis, as we observed the separation of two major groups, Type 1 (HE1-HW1) and Type 2 (HE2-HW2), and focused our analysis on these groups afterward.

      Despite this validation, there remains a strong possibility that our method might not fully capture small yet significant subpopulations of mPFC units. As a result, we have included a sentence in the methods section addressing the rationale and stability of our approach.

      “(Materials and Methods) To compensate for the limited number of neurons recorded per session, the hyperparameter set was chosen to generalize their activity and categorize them into major types, allowing us to focus on neurons that appeared across multiple recording sessions. Although changes in the hyperparameter sets resulted in different numbers of clusters, the major activity types remained consistent (Supplementary Figure S8). However, there is a chance that this method may not differentiate smaller subsets of neurons, particularly those with fewer than 50 recorded neurons.”

      Author response table 2.

      Minor points:

      Line 333: Error! Reference source not found. This was probably the place for citing Figure S2?

      Lines 339, 343: Error! Reference source not found.

      Thank you for mentioning these comments. In the new version, all reference functions from Word have been replaced with plain text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and useful suggestions. We have implemented most of the reviewers’ recommendations and hope the manuscript is clearer now.

      The main modifications are:

      - A revision of the introduction to better explain what Transitional Probabilities are and clarify the rationale of the experimental design

      - A revision of the discussion

      - To tune down and better explain the interpretation of the different responses between duplets after a stream with phonetic or voice regularities (possibly an N400).

      - To better clarify the framing of statistical learning as a universal learning mechanism that might share computational principles across features (or domains).

      Below, we provide detailed answers to each reviewer's point.

      Response to Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language.

      This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      We added one sentence in the discussion stating that more research is needed to understand whether infants can track both regularities simultaneously (p.13, l.270 “Future work could explore whether they can simultaneously track multiple regularities.”).

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Response to Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a duplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes.

      We revised the abstract (p.2, l.33) and the discussion of this result (p.15, l.299), toning them down. We hope the rationale of the interpretation is clearer now, as is the fact that it is just one possible interpretation of the results.

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.

      We report these analyses in SI and referred to them in the methods section (p.25, l.468 “We performed post-hoc tests to ensure that the results were not driven by a perception of two voices: female and male (see SI).”).

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Part- words in List B might be attributed to gender alternation.

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.

      Author response image 2:

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words,

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Figure 4 for the location of electrodes in an infant head model).

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      We added a phrase in the discussion to explain why we can expect phase-locked activity in posterior electrodes (p.14, l.277: “Auditory ERPs, after reference-averaged, typically consist of a central positivity and posterior negativity”).

      Author response image 4:

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Response to Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.

      We have rephrased the introduction to make this point clearer. See p.5, l.88-92: “To test this, we have taken advantage of the fact that syllables convey two important pieces of information for humans: what is being said and who is speaking, i.e. linguistic content and speaker’s identity. While statistical learning…”.

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation).

      We have revised the discussion to clarify this theoretical framework.

      In p.13, l.264: “This mechanism might be rooted in associative learning processes relying on the co- existence of event representations driven by slow activation decays (Benjamin et al., 2024). ”

      In p., l. 364: “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech. This supports the idea that statistical learning is a general learning mechanism, probably operating on common computational principles across neural networks (Benjamin et al., 2024)…”.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it.

      We have removed the sentence “Statistical learning is an evolutionary ancient learning mechanism.”, and replaced it by (p.18, l.364) “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech.” We now emphasise in the discussion that infants compute regularities on both features and propose that SL might be a universal learning mechanism sharing computational principles (Benjamin et al., 2024) (see point 2).

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We have revised the description of the stimuli and the legend of Figure 1 to clarify these important points.

      See p.6, l. 113: “The structure consisted of the random concatenation of three duplets (i.e., two-syllable units) defined only by one of the two dimensions. For example, in Experiment 1, one duplet could be petu with each syllable uttered by a random voice each time they appear in the stream (e.g pe is produced by voice1 and tu by voice6 in one instance and in another instance pe is produced by voice3 and tu by

      voice2). In contrast, in Experiment 2, one duplet could be the combination [voice1- voice6], each uttering randomly any of the syllables.”

      p.20, l. 390 (Figure 1 legend): “For example, the two syllables of the word “petu” were produced by different voices, which randomly changed at each presentation of the word (e.g. “yellow” voice and “green” voice for the first instance, “blue” and “purple” voice for the second instance, etc..). In Experiment 2, the statistical structure was based on voices (TPs alternated between 1 and 0.5), while the syllables changed randomly (uniform TPs of 0.2). For example, the “green” voice was always followed by the “red” voice, but they were randomly saying different syllables “boda” in the first instance, “tupe” in the second instance, etc... “

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We have modified this sentence in the manuscript to make it clearer.

      See p.7, l. 120: “If infants at birth compute regularities based on a neural representation of the syllable as a whole, i.e. comprising both phonetic and voice content, this would require computing a 36 × 36 TPs matrix relating each token.”

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym TP should be spelled out, and a brief description of the fact that dips in TPs signal boundaries while high TPs signal a cohesive unit could be useful for non-specialist readers.

      We have added it at the beginning of the introduction (lines 52-60)

      (2) p.5, l.76: "Here, we aimed to further characterise the characteristics of this mechanism...". I suggest this is rephrased as "to further characterise this mechanism".

      We have changed it as suggested by the reviewer (now p.5, l.81)

      (3) p.9, l.172: "[...] this contribution is unlikely since the electrodes differ from the electrodes, showing enhanced word-rate activity at 2 Hz."

      It is unclear which electrodes differ from which electrodes. I figure that the authors mean that the electrodes showing stronger activity at 2 Hz differ from those showing it at 4 Hz, but the sentence could use rephrasing.

      This part has been rephrased (p.9, l.177-181)

      (4) p.10, l.182: "[...] the entrainment during the first minute of the structure stream [… ]".

      Structured stream.

      It has been corrected (p.10, l.190)

      (5) p.12, l.234: "we compared STATISTICAL LEARNING"

      Why the use of capitals?

      This was an error and it was corrected (p.12, l.242).

      (6) p.15, l.298: "[...] suggesting that such negativity might be related to semantic."

      The sentence feels incomplete. To semantics? To the processing of semantic information?

      The phrase has been corrected (p.15, l.314). Additionally, the discussion of the posterior negativity observed for duplets after familiarisation with a stream with regularities over phonemes has been rephrased (p.15, l.)

      (7) Same page, l.301: "3-mo-olds" 3-month-olds.

      It has been corrected (now in p.16, l.333)

      (8) Same page, l.307: "(see also (Bergelson and Aslin, 2017)" (see also Bergelson and Aslin, 2017).

      It has been corrected (now in p.17, l.340)

      (9) Same page, l.310: "[...] would be considered as possible candidate" As possible candidates.

      This has been rephrased and corrected (now in p.17, l.343)

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2: The authors mention a "thick orange line", which I think should be a "thick black line".

      We are sorry for this. It has been corrected.

      (2) Ln 166: Should be Figure 2C rather than 3C.

      It has been corrected (now in p.9, l.173)

      (3) Figure 4 is not referenced in the manuscript.

      We referred to it now on p. 12, l.236

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written but identified a number of technical issues that I suggest should be addressed:

      We thank Reviewer 1 for finding our work interesting. We have addressed the technical issues below.

      (1) Neither the acyl chain chemical makeup nor the protonation state of CDL are specified. The acyl chain is likely 18:2/18:2/18:2/18:2, but the choice of the protonation state is not straightforward.

      We thank the Reviewer for highlighting this missing information. We have now added this information in the Materials and Methods section:

      "…were performed in a POPC:POPE:cardiolipin (2:2:1) membrane containing 5 mol% QH<sub>2</sub> / Q (1:1 ratio). Cardiolipin was modeled as tetraoleoyl cardiolipin (18:1/18:1/18:1/18:1) with a headgroup modeled in a singly protonated state (with Q<sub>tot</sub>=-1)."

      (2) The analysis of the bilayer deformation lacks membrane mechanical expertise. Here I am not ridiculing the authors - the presentation is very conservative: they find a deformed bilayer, do not say what the energy is, but rather try a range of energies in their Monte Carlo model - a good strategy for a group that focuses on protein simulations. The bending modulus and area compressibility modulus are part of the standard model for quantifying the energy of a deformed membrane. I suppose in theory these might be computed by looking at the per-lipid distribution in thickness fluctuations, but this route is extremely perilous on a per-molecule basis. Instead, the fluctuation in the projected area of a lipid patch is used to imply the modulus [see Venable et al "Mechanical properties of lipid bilayers from molecular dynamics simulation" 2015 and citations within]. Variations in the local thickness of the membrane imply local variations of the leaflet normal vector (the vector perpendicular to the leaflet surface), which is curvature. With curvature and thickness, the deformation energy is analyzed.

      See:

      Two papers: "Gramicidin A Channel Formation Induces Local Lipid Redistribution" by Olaf Andersen and colleagues. Here the formation of a short peptide dimer is experimentally linked to hydrophobic mismatch. The presence of a short lipid reduces the influence of the mismatch. See below regarding their model cardiolipin, which they claim is shorter than the surrounding lipid matrix.

      Also, see:

      Faraldo-Gomez lab "Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states", 2021. Mondal et al "Membrane Driven Spatial Organization of GPCRs" 2013 and many citations within these papers.

      While I strongly recommend putting the membrane deformation into standard model terms, I believe the authors should retain the basic conservative approach that the membrane is strongly deformed around the proteins and that making the SC reduces the deformation, then exploring the consequences with their discrete model.

      We thank the Reviewer for the suggestions and for pointing out the additional references, which are now cited in the revised manuscript. The analysis is indeed significantly more complex for large multi-million atom supercomplexes in comparison to small peptides (gramicidin A) or model systems of lipid membranes. However, in the revised manuscript, we have conducted further analysis on the membrane curvature effects based on the suggestions. We were able to estimate the energetic contribution of the changes in local membrane thickness and curvature, which are now summarized in Table 1, and described in the main text and SI. We find that both the curvature and local thickness contribute to the increased stability of SC.

      We have now extensively modified the result to differentiate between different components of membrane strain properly:

      "We observe a local decrease in the membrane thickness at the protein-lipid interface (Fig. 2G, Fig S2A,D,E), likely arising from the thinner hydrophobic belt region of the OXPHOS proteins (ca. 30 Å, Fig. S1A) relative to the lipid membrane (40.5 Å, Fig. S1). We further observe ∼30% accumulation of cardiolipin at the thinner hydrophobic belt regions (Fig. 2H, Fig. S2B,F,G), with an inhomogeneous distribution around the OXPHOS complexes. While specific interactions between CDL and protein residues may contribute to this enrichment (Fig. 2N), CDL prefers thermodynamically thinner membranes (∼38 Å, Fig. S1B, Fig. S5F). These changes are further reflected in the reduced end-toend distance of lipid chains in the local membrane belt (see Methods, Fig. S6, cf. also Refs. (41-44). In addition to the perturbations in the local membrane thickness, the OXPHOS proteins also induce a subtle inward curvature towards the protein-lipid interface (Fig. S5G), which could modulate the accessibility of the Q/QH2 substrate into the active sites of CI and CIII<sub>2</sub> (see below, section Discussion). This curvature is accompanied by a distortion of the local membrane plane itself (Fig. 2A-F, Fig. S4AC, Fig. S7), with perpendicular leaflet displacements reaching up to ~2 nm relative to the average leaflet plane.

      To quantify the membrane strain effects, we analyzed the cgMD trajectories by projecting the membrane surface onto a 2-dimensional grid and calculating the local membrane height and thickness at each grid point. From these values, we quantified the local membrane curvature (Fig. S5H), which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>). We also computed the energetics associated with changes in the membrane thickness, assessed from the deviations from an ideal local membrane in the absence of embedded proteins (ΔG<sub>thick</sub>, see Supporting Information, for technical details). Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature decreasing by 19.8 ± 1.3 kcal mol-1 and the thickness penalty by 2.8 ± 2.0 kcal mol-1 (Table 1). These results indicate a significant thermodynamic advantage for SC formation, as it minimizes lipid deformation and stabilizes the membrane environment surrounding Complex I and III.”

      […]

      “Taken together, the analysis suggests that the OXPHOS complexes affect the mechanical properties of the membranes by inducing a small inwards curvature towards the protein-lipid interface (Fig. S5), resulting in a membrane deformation effect, while the SC formation releases some deformation energy relative to the isolated OXPHOS complexes. The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, is also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      Our Supporting Information section now provides additional information about the membrane curvature.

      (41) R. M. Venable, F. L. H. Brown, R. W. Pastor, Mechanical properties of lipid bilayers from molecular dynamics simulation. Chemistry and Physics of Lipids 192, 60-74 (2015).

      (42) R. Chadda et al., Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states. eLife 10, e63288 (2021).

      (43) S. Mondal et al., Membrane Driven Spatial Organization of GPCRs. Scientific Reports 3, 2909 (2013).

      (44) J. A. Lundbæk, S. A. Collingwood, H. I. Ingólfsson, R. Kapoor, O. S. Andersen, Lipid bilayer regulation of membrane protein function: gramicidin channels as molecular force probes. Journal of The Royal Society Interface 7, 373-395 (2009).

      We also expanded our SI Method section to account for the new calculations:

      “Analysis of lipid chain end-to-end length

      To probe the protein-induced deformation effect of the membrane, the membrane curvature (H), and the end-to-end distance between the lipid chains, were computed based on aMD and cgMD simulations. The lipid chain length was computed from simulations A1-A6 and C1 based on the first and last carbon atoms of each lipid chain. For example, the end-to-end length of a cardiolipin chain was determined as the distance between atom “CA1” and atom “CA18”.

      “Membrane Curvature and Deformation Energy

      The local mean curvature of the membrane midplane was computed by approximating the membrane surface as a height function Z(x,y), defined as the average location of the N-side and P-side leaflets at each grid point. Based on this, the mean curvature H(x,y) was calculated as,

      where the derivatives are defined as .

      The thickness deformation energy was computed from the local thickness d(x,y) relative to a reference thickness distribution F(d), derived from membrane-only simulations, and converted to a free energy profile via Boltzmann inversion. At each grid point, the F(d) was summed over the grid,

      The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20 kJ mol-1 = 4.78 kcal mol-1):

      where Δ_A_ is the area of the grid cell.

      The thickness and curvature fields were obtained by projecting the coarse-grained MD trajectories (one frame per ns) onto a 2D-grid with a resolution of 0.5 nm. Grid points with low occupancy were downweighted to mitigate noise. More specifically, points with counts below 50% of the median grid count were scaled linearly by their relative count value. To focus the analysis on the region around the protein– membrane interface, only grid points within a radius of 20 nm from the center of the complex were included in the energy calculations. Energies were normalized to an effective membrane area of 1000 nm2 to facilitate the comparison between systems. Bootstrapping with resampling over frames was performed to estimate the standard deviations of G<sub>thick</sub> and G<sub>curv</sub>.

      We find that G<sub>curve</sub> converges slowly due to its sensitivity to local derivatives and the small grid size required to resolve the curvature contribution near the protein. Consequently, tens of microseconds of simulations were necessary to obtain well-converged estimates of the curvature energy.”

      (1) If CDL matches the hydrophobic thickness of the protein it would disrupt SC formation, not favor it. The authors' hypothesis is that the SC stabilizes the deformed membrane around the separated elements. Lipids that are compatible with the monomer deformed region stabilize the monomer, similarly to a surfactant. That is, if CDL prefers the interface because the interface is thin and their CDL is thin, CDL should prevent SC formation. A simpler hypothesis is that CDL's unique electrostatics are part of the glue.

      We rephrased the corresponding paragraph in the Discussion section to reflect the role of electrostatics for the behavior of cardiolipin.

      "…supporting the involvement of CDL as a "SC glue". In this regard, electrostatic effects arising from the negatively charged cardiolipin headgroup could play an important role in the interaction of the OXPHOS complexes."

      Generally our simulations suggest that CDL prefers thinner membranes, which could rationalize these findings.

      "We find that CDL prefers thinner membranes relative to the neutral phospholipids (PE/PC, Fig. S5F),[…]”

      (2) Error bars for lipid and Q* enrichments should be computed averaging over multi-lipid regions of the protein interface, e.g., dividing the protein-lipid interface into six to ten domains, in particular functionally relevant regions. Anionic lipids may have long, >500 ns residence times, which makes lipid enrichment large and characterization of error bars challenging in short simulations. Smaller regions will be noisy. The plots depicted in, for example, Figure S2 are noisy.

      It is indeed challenging to capture lipid movements on the timescales accessible for atomistic MD, and hence the data in Figure S2 contains some noise. In this regard, for the cgMD data presented in the revised Fig. S2H,I, the concentration data was averaged for six domains of the protein-lipid interface.

      (3) The membrane deformation is repeatedly referred to as "entropic" without justification. The bilayer has significant entropic and enthalpic terms just like any biomolecule, why are the authors singling out entropy? The standard "Helfrich" energetic Hamiltonian is a free energy model in that it implicitly integrates over many lipid degrees of freedom.

      We apologize for the unclear message – our intention was not to claim that the effects are purely entropic, but could arise from a combination of both entropic and enthalpic effects. We hope that this has now been better clarified in the revised manuscript. We also agree that it is difficult to separate between entropic and enthalpic effects. However, we wish to point out that, e.g., the temperature-dependence of the SC formation suggests that the entropic contribution is also affecting the process.

      Regarding the Helfrich Hamiltonian, we note that the standard model assumes a homogeneous fluid-like sheet. We have thus difficulties in relating this model to capture the local effects.

      Revisions / clarifications in the main manuscript:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Figure S7 shows the surface area per lipid and leaflet height. This appears to show a result that is central to the interpretation of SC formation but which makes very little sense. One simply does not increase both the height and area of a lipid. This is a change in the lipid volume! The bulk compressibility of most anything is much higher than its Young's modulus [similar to area compressibility]. Instead, something else has happened. My guess is that there is *bilayer* curvature around these proteins and that it has been misinterpreted as area/thickness changes with opposite signs of the two leaflets. If a leaflet gets thin, its area expands. If the manuscript had more details regarding how they computed thickness I could help more. Perhaps they measured the height of a specific atom of the lipid above the average mid-plane normal? The mid-plane of a highly curved membrane would deflect from zero locally and could be misinterpreted as a thickness change.

      We thank the Reviewer for this insightful comment. We chose to define the membrane thickness based on the height of the lipid P-atoms above the average midplane normal. The Reviewer is correct that this measurement gives a changing thickness for a highly curved membrane. However, in this scenario, the thickness would always be overestimated [d<sub>true</sub> = d<sub>measured</sub> / cos (angle between global mid-plane normal and local mid-plane normal)]. Therefore, since we observe a smaller thickness at the protein-lipid interface, the effect is not likely to result from an artifact. For further clarification, see Fig. S4I showing the averaged local position of the Patoms in the cgMD simulations, which further supports that there is a local deformation of the lipid.

      The changes in the local membrane thickness are also supported by our analysis of the membrane thickness (Fig.S2A) and by the lipid chain length distributions (Fig.S6).

      (5) The authors write expertly about how conformational changes are interpreted in terms of function but the language is repeatedly suggestive. Can they put their findings into a more quantitative form with statistical analysis? "The EDA thus suggests that the dynamics of CI and CIII2 are allosterically coupled."

      We extended our analysis on the allosteric effects, which is now described in the revised main text, the SI and the Methods section:

      "In this regard, our graph theoretical analysis (Fig. S11C,D) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (50, 51), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on cryo-EM analysis (40)."

      “Extended Methods

      Allosteric Network Analysis. Interactions between amino acid residues were modeled as an interaction graph, where each residue was represented by a vertex. Two nodes were connected by an edge, if the Ca atoms of the corresponding amino acid residues were closer than 7.5 Å for more than 50% of the frames of simulations S1-S6 (time step of frames: 1 ns). (7) This analysis was carried out for the aMD simulations of the supercomplex, analyzing differences between the Q bound and apo states (simulations A1+A2+A3 vs. A4+A5+A6).”

      (6) The authors write "We find that an increase in the lipid tail length decreases the relative stability of the SC (Figure S5C)" This is a critical point but I could not interpret Figure S5C consistently with this sentence. Can the authors explain this?

      We apologize for this oversight. This sentence should refer to Fig. S5F, which has now been corrected. We have additionally updated the figure to provide an improved estimation of the thickness contribution based on the lipid tail length.

      "We find that an increase in the lipid tail length decreases the relative stability of the SC (Fig. S5F)"

      (7) The authors use a 6x6 and 15x15 lattice to analyze SC formation. The SC assembly has 6 units of E_strain favoring its assembly, which they take up to 4 kT. At 3 kT, the SC should be favored by 18 kT, or a Boltzmann factor of 10^8. With only 225 sites, specific and non-specific complex formation should be robust. Can the authors please check their numbers or provide a qualitative guide to the data that would make clear what I'm missing?

      In the revised manuscript, we have now clarified the definition of the lattice model and the respective energies:

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results) ... but confusing in terms of the non-standard presentation of membrane mechanics and the difficulty of this reviewer to interpret some of the underlying figures: especially, the thickness of the leaflets around the protein and the relative thickness of cardiolipin. Resolving the quantitative interpretation of the bilayer deformation would greatly enhance the significance of their Monte Carlo model of SC formation.

      We thank the Reviewer for the helpful suggestion. We hope that the revisions help to clarify the non-standard presentation and connect to concepts used in the lipid membrane community.

      Reviewer #2 (Public review):

      Summary:

      The authors have used large-scale atomistic and coarse-grained molecular dynamics simulations on the respiratory chain complex and investigated the effect of the complex on the inner mitochondrial membrane. They have also used a simple phenomenological model to establish that the super complex (SC) assembly and stabilisation are driven by the interplay between the "entropic" forces due to strain energy and the enthalpies forces (specific and non-specific) between lipid and protein domains. The authors also show that the SC in the membrane leads to thinning and there is preferential localisation of certain lipids (Cardiolipin) in the annular region of the complex. The data reports that the SC assembly has an effect on the conformational dynamics of individual proteins making up the assembled complex and they undergo "allosteric crosstalk" to maintain the stable functional complex. From their conformational analyses of the proteins (individual and while in the complex) and membrane "structural" properties (such as thinning/lateral organization etc) as well from the out of their phenomenological lattice model, the authors have provided possible implications and molecular origin about the function of the complex in terms of aspects such as charge currents in internal mitochondrion membrane, proton transport activity and ATP synthesis.

      Strengths:

      The work is bold in terms of undertaking modelling and simulation of such a large complex that requires simulations of about a million atoms for long time scales. This requires technical acumen and resources. Also, the effort to make connections to experimental readouts has to be appreciated (though it is difficult to connect functional pathways with limited (additive forcefield) simulations.

      We thank the Reviewer for recognizing the challenge in simulating multimillion atom membrane proteins. We also thank the Reviewer for recognizing the connections we have made to different experiments. Our work indeed relies on atomistic and coarse-grained molecular simulations, which are widely recognized to provide accurate models of membrane proteins.

      Weakness:

      There are several weaknesses in the paper (please see the list below). Claims such as "entropic effect", "membrane strain energy" and "allosteric cross talks" are not properly supported by evidence and seem far-fetched at times. There are other weaknesses as well. Please see the list below.

      We thank the Reviewer for pointing out that key concepts needed further clarification. Please see answers to specific questions below:

      (i) Membrane "strain energy" has been loosely used and no effort is made to explain what the authors mean by the term and how they would quantify it. If the membrane is simulated in stress-free conditions, where are strains building up from?

      We thank the Reviewer for this important question. In the revised manuscript, we have toned down the assignment of the effects into pure entropic or enthalpic effects. We have also provided further clarification of the effects observed in the membrane.

      Example of revisions / clarifications in the main text:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      We have also revised the result section, where we now have explicitly defined and clarified the different contributions to membrane strain, observed in our simulations:

      In the following, we define membrane strain as the local perturbations of the lipid bilayer induced by protein-membrane interactions. These include changes in (i) membrane thickness, (ii) the local membrane composition, (iii) lipid chain configurations, and (iv) local curvature of the membrane plane relative to an undisturbed, protein-free bilayer. Together, these phenomena reflect the thermodynamic effects associated with accommodating large protein complexes within the membrane.

      We now also provide a more quantitative estimation of the membrane strain based on the contribution of changes in local thickness and curvature, summarize in Table 1.

      (ii) In result #1 (Protein membrane interaction modulates the lipid dynamics ....), I strongly feel that the readouts from simulations are overinterpreted. Membrane lateral organization in terms of lipids having preferential localisation is not new (see doi: 10.1021/acscentsci.8b00143) nor membrane thinning and implications to function (https://doi.org/10.1091/mbc.E20-12-0794). The distortions that are visible could be due to a mismatch in the number of lipids that need to be there between the upper and lower leaflets after the protein complex is incorporated. Also, the physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets - none of which has been considered. Connecting chain length to strain energy is also not well supported - are the authors trying to correlate membrane order (Lo vs Ld) with strain energy?

      We thank the Reviewer for the suggestions. The role of the membrane in driving supercomplex formation has not, to our knowledge, been suggested before. There are certainly many important studies, which have been better highlighted in the revised manuscript. In this context, we also now cite the papers Srivastava & coworkers and Tielemann & coworkers.

      “The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      (45) V. Corradi et al., Lipid–Protein Interactions Are Unique Fingerprints for Membrane Proteins. ACS Central Science 4 (June 13, 2018).

      (46) K. Baratam, K. Jha, A. Srivastava, Flexible pivoting of dynamin pleckstrin homology domain catalyzes fission: insights into molecular degrees of freedom. Molecular Biology of the Cell 32 (2021 Jul 1).

      Physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets

      We agree with this point. As shown in Figs. 2H,N, S6, S13, we suggest that cardiolipin functions as a buffer molecule. However, very little is experimentally known about the asymmetric distribution of lipids in the IMM. Therefore, modelling the effect of asymmetry across the left is outside the scope of this study. Moreover, as now better clarified in the revised manuscript, we agree that it is difficult to unambiguously divide the effect into enthalpic and entropic contributions.

      To address the main concern of the Reviewer, we have updated the main text and Supporting Information to clearly state the different aspects of how the proteinmembrane interactions induce perturbations of the lipid bilayer. We define these effects as membrane strain. We now use the changes in local thickness and local curvature to quantify the effect of membrane strain on the stability of the respiratory SC.

      (iii) Entropic effect: What is the evidence towards the entropic effect? If strain energy is entropic, the authors first need to establish that. They discuss enthalpy-entropy compensation but there is no clear data or evidence to support that argument. The lipids will rearrange themselves or have a preference to be close to certain regions of the protein and that generally arises because of enthalpies reasons (see the body of work done by Carol Robinson with Mass Spec where certain lipids prefer proteins in the GAS phase, certainly there is no entropy at play there). I find the claims of entropic effects very unconvincing.

      We agree that it is difficult to distinguish the entropic vs. enthalpic contributions. In the revised manuscript, we better clarify that both effects are likely to be involved.

      The native MS work by Robinson and coworkers and others support that many lipids are strongly bound to membrane proteins, as also supported by the local binding of certain lipid molecules, such as CDL to the SC (Figs. S2, S6, S13).

      We suggest that the accumulation of cardiolipin at the protein-lipid interface involves a combination of entropic and enthalpic effects, arising from the reduction of the lipid mobility (entropy) as indicated by lowered diffusion (Fig. S9), and formation of noncovalent bonds between the lipid and the OXPHOS protein (Fig. S14).

      We added further clarification to the Discussion section.

      “Taken together, our combined findings suggest that the SC formation is affected by thermodynamic effects that reduce the molecular strain in the lipid membrane, whilst the perturbed micro-environment also affects the lipid and Q dynamics, as well as the dynamics of the OXPHOS proteins (see below).”

      (iv) The changes in conformations dynamics are subtle as reported by the authors and the allosteric arguments are made based on normal mode analyses. In the complex, there are large overlapping regions between the CI, CIII2, and SCI/III2. I am not sure how the allosteric crosstalk claim is established in this work - some more analyses and data would be useful. Normal mode analyses (EDA) suggest that the motions are coupled and correlated - I am not convinced that it suggests that there is allosteric cross-talk.

      Our analysis suggests that the SC changes the dynamics of the system. Although it is difficult to assign how these effects result in activity modulation of the system, we note these changes relate to sites that are central for the charge transfer reactions. We thank the Reviewer for suggesting to extend the analysis, which further suggests that regions of the proteins could be allosterically coupled.

      (v) The lattice model should be described better and the rationale for choosing the equation needs to be established. Specific interactions look unfavourable in the equation as compared to non-specific interactions.

      We have now provided further clarification of the lattice model in the Methods section. Addition to the main text:

      “Lattice model of SC formation. A lattice model of the CI and CIII<sub>2</sub> was constructed (Fig. 4A,B) by modeling the OXPHOS proteins in unique grid positions on a 2D N×N lattice. Depending on the relative orientation, the protein-protein interaction was described by specific interactions (giving rise to the energetic contribution E<sub>specific</sub> < 0) and non-specific interactions (E<sub>non-specific</sub> > 0). The membrane-protein interaction determined the strain energy of the membrane (E<sub>strain</sub>), based on the number of neighboring "lipid" occupied grids that are in contact with proteins (Fig. 4A). The interaction between the lipids was indirectly accounted for by the background energy of the model. The proteins could occupy four unique orientations on a grid ([North, East, South, West]). The states and their respective energies that the system can visit are summarized in Table S6.”

      “The conformational landscape was sampled by Monte Carlo (MC) using 10<sup>7</sup> MC iterations with 100 replicas. Temperature effects were modeled by varying β, and the effect of different protein-to-lipid ratios by increasing the grid area. The simulation details can be found in Table S7.”

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained, and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. Overall, the study is rather thorough and highly creative, and the impact on the field is expected to be significant.

      Weaknesses:

      In general, I don't think the work contains any obvious weaknesses, although I was left with some questions.

      We thank the Reviewer for acknowledging that our work is thorough and creative, and that it is likely to have a significant impact on the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Diffusion is quantified in speed units (Figure S8). The authors should explain why they have used an apparently incorrect model for quantifying diffusion. The variance of the distribution of a diffusing molecule is linear with time, not its standard deviation (as I suppose I would use for computing effective molecular speed). Perhaps they are quantifying residence times, in which molecules near a wall (protein) will appear to have half the movements of a bulk molecule. This is confusing.

      We thank the Reviewer for the comment. The data shown in previous version of Figure S8 corresponded to the effective molecular velocity, which is now clarified in the revised figure (now Fig. S9). This measure was used to reflect the average residence time of the groups in the vicinity of the sites.

      However, as suggested by the Reviewer, we now also analyzed the positiondependent diffusion of the quinone in the new Figure S9:

      (2) With a highly charged bilayer a large water layer is necessary to verify that the concentration of salt is plateauing at 150 mM at the box edge. 45 A appears to be the default in CHARMM-GUI, but this default guidance is not based on the charge of the bilayer. I suggest the authors plot the average concentration of both anions and cations in mM units along the z coordinate of the simulation cell.

      We thank the Reviewer for the suggestion. We have now provided an analysis of the average ion concentrations along the z coordinate, supporting that the salt concentration plateaus at 150 mM at the box edge.

      Typos:

      SI: "POPC/POPE or CLD" should be CDL

      We apologize for the mistake. We have corrected the typos:

      "of the membrane thickness in a POPC/POPE/CDL/QH2 membrane and a CDL membrane."

      "a pure CDL membrane"

      Reviewer #2 (Recommendations for the authors):

      (1) Suggestion regarding membrane strain energy claims:

      Changes in area per lipid and membrane thinning are surely not akin to membrane strain energy changes. At best, the authors should calculate the area compressibility (both in bilayers with and without proteins) and then make comments. In general, if they are talking about the in-plane properties (bilayer being liquid in 2D), I do not see how they can discuss membrane strain energy with NPT=1 atms barostat reservoir that they are simulating against. At least they can try to plot the membrane lateral pressures in various conditions and then start making such comments. If it was a closed vesicle, I would expect some tension in the membrane due to the closed surface but in the conditions in which the simulations are run, I do not see how strain is so important. If the authors want to be more rigorous, they can calculate "atomic viral" values by doing a tessellation and showing the data to make their point. Strain energy would mean that there is a modulus in-plane. Bending modulus would surely change with membrane thinning and area compressibility changes (simple plate theory) but linear strain is surely something to be defined well before making claims out of it.

      Our work shows that the OXPHOS proteins alter the local membrane thickness and curvature, and we now quantify the deformation penalty associated with that (Table 1). As stated above, we now provide a better definition and description 'membrane strain’ and the observed effect, which is likely to contain both enthalpic and entropic contributions.

      As suggested by the Reviewer, we have computed the lateral pressure profiles around the OXPHOS proteins, further supporting that there are energetic effects related to the "solvation" of the membrane proteins in the IMM. To this end, Figs. S2D,E; Figure S4I and Fig. S5G,H shows the membrane distortion effect; while in Fig. S5A supports that there the 'internal energy' of the lipids changes as result of the SC formation, further justifying that these effects can be assigned as 'strain effects'. The analysis has also been extended by computing the end-to-end distances, shown in Fig. S6.

      Unfortunately, it is technically unfeasible to accurately estimate the area compressibility, bending modulus, or the atomic virial for the present multi-million membrane protein simulations.

      Summary of Revisions/Additions:

      Fig. S2 [...] (D, E) Difference in the membrane thickness around the SC relative to CI (left) or relative to CIII<sub>2</sub> (right) from (D) aMD and (E) cgMD.

      Fig. S4. [...] (I) Visualization of the membrane distortion effect.

      Fig. S5. Analysis of membrane-induced distortion effects. (A) Relative strain effect relative to a lipid membrane from atomistic MD simulations of the SCI/III2, CI, and CIII<sub>2</sub>, suggesting reduction of the membrane strain (blue patches) in the SC surroundings. The figure shows the non-bonded energies relative to the average non-bonded energies from membrane simulations (simulation M4, Table S1). (B) The lipid strain contribution for different lipids calculated from non-bonded interaction energies of the lipids relative to the average lipid interaction in a IMM membrane model (simulation M4). The figure shows the relative strain contribution for nearby lipids (r < 2 Å, in color from panel (C), and lipids >5 Å from the OXPHOS proteins. (C) Selection of lipids (< 2 Å) interacting with the OXPHOS proteins. (D) Potential of mean force (PMF) of membrane thickness derived from thickness distributions from cgMD simulations of a membrane, the SCI/III2, CI, and CIII<sub>2</sub>. (E) Membrane thickness as a function of CDL concentration from cgMD simulations. (F) ΔGthick of the SC as a function of membrane thickness based on cgMD simulations. (G) Membrane curvature around the SCI/III2 (left), CI (middle), and CIII<sub>2</sub> (right) from atomistic simulations. (H) Squared membrane curvature obtained from cgMD simulations, within a 20 nm radius around the center of the system. These maps correspond to the curvature field used in the calculation of the bending deformation energy term (G<sub>curv</sub>).

      Fig. S6. Analysis of lipid end-to-end distance from aMD simulations of (A) SC, (B) CI, (C) CIII<sub>2</sub>.

      (2) Membrane distortions:

      Membrane distortions can arise due to a mismatch in the area between the upper leaflet and the lower left especially when a protein is embedded. Authors can carefully choose the numbers to keep the membrane stable.

      We have further clarified in the revised manuscript that the membranes are stable in all simulation setups. During building the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacements. Our results of the local changes in the lipid dynamics and structure around the OXPHOS complexes are independently supported by both our atomistic and coarse-grained simulations, which contain significantly larger membranes. Moreover, as discussed in our work, the local membrane distortion is also experimentally supported by cryoEM analysis as well as recent in situ cryoTEM data, showing that the OXPHOS proteins indeed affect the local membrane properties.

      Clarifications/Additions to the main text:

      “We find that the individual OXPHOS complexes, CI and CIII<sub>2</sub>, induce pronounced membrane strain effects, supported both by our aMD (Fig. S2A) and cgMD simulations with a large surrounding membrane (Fig. 2G).“

      ” The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      "During construction of the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacement effects."

      (3) Strain energy as an entropic effect:

      Please establish that the strain energy (if at all present) can be called an entropic effect.

      We have now better clarified that the SC formation results from combined enthalpic and entropic effects. We apologize that the previous version of the text was unclear in this respect.

      To further probe the involvement of entropic effects, we derived entropic and enthalpic contributions from our lattice model. The model supports that increased strain contributions also alters the entropic contributions, further supporting the coupling between the effects.

      We have also clarified our definition of the effects:

      " The perturbed thickness and alteration in the lipid dynamics leads to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Allosteric cross-talk:

      A thorough network analysis (looking at aspects like graph laplacian, edge weights, eigenvector centrality, changes in characteristic path length, etc can be undertaken to establish allostery (see https://doi.org/10.1093/glycob/cwad094, Ruth Nussinov/Ivet Bahar papers).

      We have expanded the network analysis as suggested by the Reviewer. In this regard, we have expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings_._ Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis.

      (5) Lattice model:

      The equation needs to be rationalised. For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and nonspecific interaction favours proximity. Why is that? Also, the notation for degeneracy in partition function and the notation for lattice point. It is mentioned that the "interaction between the lipids was indirectly accounted for by the "background energy" of the model". If the packing/thinning etc are so important to the molecular simulations, will not the background energy change with changing lipid organising during complex formation?

      We have further expanded the technical discussion of the energy terms in our lattice model.

      For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and non-specific interaction favours proximity. Why is that

      "The g<sub>i</sub>g<sub>j</sub> -term assigns a specific energy contribution when the OXPHOS complexes are in adjacent lattice points only in a correct orientation (modeling a specific non-covalent interaction between the complexes such as the Arg29<sup>FB4</sup>-Asp260<sup>C1</sup>/Glu259<sup>C1</sup> interaction between CI and CIII<sub>2</sub>). The d<sub>i</sub>d<sub>j</sub> -term assigns a non-specific interaction for the OXPHOS complexes when they are in adjacent lattice points, but in a "wrong" orientation relative to each other to form a specific interaction. The term introduces a strain into all lattice points surrounding an OXPHOS complex, mimicking the local membrane perturbation effects observed in our molecular simulations.

      This leads to the partition function,

      where wi is the degeneracy of the state, modeling that the SC and OXPHOS proteins can reside at any lattice position of the membrane, and where β=1/k<sub>B</sub>T (k<sub>B</sub>, Boltzmann's constant; T, temperature). The probability of a given state i was calculated as,

      with the free energy (G) defined as,

      This discussion has been included in the methods sections to ensure that our work remains readable for the biological community studying supercomplexes from a biochemical, metabolic, and physiological perspectives.

      (6) This is a minor issue but the paper is poorly organised and can be fixed readily. The figures are not referenced in order. For example, Figure 2G is discussed before discussing Figures 2A-2F (never discussed). Figure S2 is referenced before Figure S1.

      Answer: We thank the Reviewer for pointing this out. The order of the figures was revised.

      Reviewer #3 (Recommendations for the authors):

      A few minor questions/suggestions, not necessarily in the order of importance:

      (1) The discussion of the timescale of simulations is a bit misleading. For example, the discussion cites a timescale of 0.3 ms of CG simulations. The value is actually the sum of multiple CG simulations on the order of 50-75 microseconds. These are already very impressive lengths of CG simulations, there is no need to use the aggregated time to claim even longer time scales.

      We thank the Reviewer for the suggestion on this important clarification. We have now modified the text and tables accordingly:

      "(0.3 ms in cumulative simulation time, 50-75 μs/cgMD simulation)"

      (2) The observation of cardiolipin (CDL) accumulation is interesting. How close are the head groups, relative to the electrostatic screening length at the interface? Should one worry about the potential change of protonation state coupled with the CDL redistribution?

      Answer: We thank the Reviewer for this excellent comment, which has also been on our mind. The CDL indeed form contacts with various functional groups at the protein interface (as shown in Fig. S13), as well as bulk ions (sodium) that could tune the p_K_a of the CDLs, and result in a protonation change. We have clarified these effects in the revised manuscript:

      "While CDL was modeled here in the singly anionic charged state (but cf. Fig. S5E), we note that the local electrostatic environment could tune their p_K_a that result in protonation changes of the lipid, consistent with its function as a proton collecting antenna (62)."

      (3) The authors refer to the membrane strain effect as entropic. Since membrane bending implicates a free energy change that includes both enthalpic and entropic components, I wonder how the authors reached the conclusion that the effect is largely entropic in nature.

      We agree with the Reviewer that the effects are likely to comprise both enthalpic and entropic contributions, which are difficult to separate in practice. To reflect this, we have now better clarified why we consider that both contributions are involved. We apologize that our previous version of the manuscript was unclear in this respect. Clarifications in the main text:

      “The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) The authors refer to the computed dielectric constant as epsilon_perpendicular. Did the authors really distinguish the parallel and perpendicular component of the dielectric tensor, as was done by, for example, R. Netz and co-workers for planar surfaces?

      We have extracted the perpendicular dielectric constant from the total dielectric profiles. We clarify that this differs from the formal definition of by Netz and coworkers.

      “The calculations were performed by averaging the total M over fixed z values from the membrane plane. Note that this treatment differs from extraction of radial and axial contributions of the dielectric tensor, as developed by Netz and co-workers (cf. Ref. (3) and refs therein) that requires a more elaborate treatment, which is outside the scope of the present work.”

      (3) P. Loche, C. Ayaz, A. Schlaich, Y. Uematsu, R.R. Netz. Giant Axial Dielectric Response in Water-Filled Nanotubes and Effective Electrostatic Ion-Ion Interactions from a Tensorial Dielectric Model. J Phys Chem B 123, 10850-10857 (2019).

      (5) Regarding the effect of SC formation on protein structure and dynamics, especially allosteric effects, most of the discussions are rather qualitative in nature. More quantitative analysis would be valuable. For example, the authors did compute covariance matrix although it appears that they chose not to discuss the results in depth. Is the convergence of concern and therefore no thorough discussion is given?

      We have now expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change, especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      “In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis (40).”

      (6) The discussion of quinone diffusion is interesting, although I'm a bit intrigued by the unit of the diffusion constant cited in the discussion. Perhaps a simple typo?

      The plot showed the molecular velocity, which roughly corresponding to the residence times. However, as suggested by the Reviewer, we now also analyzed the position-dependent diffusion of the quinone in the new Figure S9:

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public reviews:

      (1) Response to Reviewer #1: 

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article (Table. S2 Summary Table), which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      (2) Response to Reviewer #2: 

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Response to comment:

      The conclusions of this paper are mostly well supported by clear images and convincing data analysis, but some aspects of image presentation and additional data analysis may be needed to strengthen the manuscript.

      We sincerely appreciate your positive assessment of our work and your recognition of the clear images and convincing data analysis supporting our conclusions. Your constructive feedback on enhancing the clarity of our manuscript's image presentation and additional data analysis is highly valued. In response to your suggestions, we have taken steps to improve readability by removing or correcting uncommon acronyms from certain images. We have also conducted further data analysis to provide more comprehensive insights. Thank you for your guidance in improving the quality of our manuscript.

      (2) Response to recommendation (1):

      In Results 3.1 or in Method 2.2: please explain why this combination of silicone wire embolization and carotid artery ligation was chosen to replace previous models such as UCCAO? What are the advantages? And why the silicone wire embolus was inserted through ECA instead of inserting into CCA directly? The cleverly designed surgical procedure is very impressive but the reasoning behind it is not obvious and needs more explanation.

      Thank you for your valuable feedback.

      In the introduction, we briefly describe the rationale for developing the UPOAO model to simulate acute ischemia-reperfusion of retinal artery occlusion (RAO). Previous common retinal ischemia model had certain shortcomings. For example, in the HIOP model, which is often used for simulating glaucoma, the ischemic factor of interrupted retinal blood flow may be amplified due to the dual effects of IOP-induced mechanical stress [1, 2] and vascular ischemia due to normal saline perfusion in the anterior chamber. In the UCCAO model, recanalization is performed after ligation of the carotid blood vessels, and the retina communicates with the blood vessels in the brain, resulting in retinal hypoperfusion. The retina ischemia in UCCAO is a chronical process, for example, the retina became thinner at week 10 and week 15 [3], while RAO is an acute total retinal ischemic disease. Therefore, it is critically important to develop a simple mouse model that can simulate acute retinal ischemia and reperfusion injury in RAO patients.

      Various models have been developed for ischemic stroke research, with the endoluminal suture model being the most employed method for middle cerebral artery occlusion (MCAO). In this model, filaments are introduced through either the external or internal carotid artery and advanced into the middle cerebral artery, causing temporary blood flow blockage for a specific duration. This method has been extensively employed in studies involving transient occlusion [4]. Among the MCAO models, the Koizumi method (occlusion from the common carotid artery (CCA) to the middle cerebral artery (MCA)) and the Longa method (occlusion from the external carotid artery (ECA) to the MCA) are frequently used. Among these two methods, the Longa method is more widely utilized in research studies. The Longa method has a much lower mortality rate post-surgery (26%) than that of the Koizumi (44%) [5]. The MCAO model induces substantial infarct areas and significantly contributes to advancements in stroke research, including investigations into blood-brain barrier disruption and inflammatory responses to ischemia.

      RAO is considered a form of ocular stroke. Inspired by the MCAO model, we have employed a silicone wire embolus to induce acute interruption of blood flow to the retina. This approach enables the investigation of pathophysiological processes associated with RAO, providing valuable insights into the understanding of this condition. We have clarified these points in the revised manuscript (line 129).

      The reasoning behind inserting the silicone wire embolus through the ECA instead of directly into the CCA is twofold:

      (1) Convenience and avoidance of heavy bleeding and mortality. Inserting the silicone wire embolus requires creating an opening in the artery, which then needs to be ligated at both ends after the silicone wire embolus is removed to prevent excessive bleeding. The ECA's ability to form a straight line with the ICA after folding makes it more convenient for the entry and removal of the silicone wire embolus. This procedure is more convenient to perform on the ECA. The blood flow to the CCA can be restored after the plug is removed from ECA, ensuring that the blood supply to the brain through the CCA is not affected.

      (2) Preservation of reperfusion process. If the silicone wire embolus were inserted directly into the CCA, the ends of the CCA opening would need to be ligated after the silicone wire embolus is removed. This would result in a lack of reperfusion process after retinal ischemia. To enable the reperfusion process, the decision was made to open the ECA instead.

      We have clarified these points in the revised manuscript to better explain the rationale behind our methodology (line 139). Thank you for prompting this important clarification, which we believe will enhance the understanding of our readers.

      (3) Response to recommendation (2):

      Did the UPOPA actually block OA, including both the retinal (CRA) and choroidal (SPCA and LPCA) blood supply? If so, why does it seem only the inner retina was affected but not the outer retina?

      Thank you for your question. We agree with you that the UPOAO model blocks OA, which includes retinal and choroidal vessels. Our experimental results primarily indicate damage to the inner retinal layer within 7 days of reperfusion. For example, OCT and HE staining showed significant thinning of the inner retina after 60 minutes of ischemia followed by 7 days of reperfusion (Figure 4). At the same time, the b-wave amplitudes were decreases, usually indicating damage to the inner layer of the retina. However, the outer retina was seemed not affected by 60 minutes of ischemia based on the results of OCT, HE and immunofluorescence.

      Inner layer of the retina was known to show the highest sensitivity to hypoxic challenges [6], whereas the outer retinal layer was more resistant to hypoxic stress [7]. The possible reason for these results was that the outer layer like photoreceptors is more tolerant against ischemia than inner layer of the retina. Previous studies of retinal ischemia-reperfusion models supported this assumption. In the UCCAO model, the b-wave was more affected than the a-wave. Decreases in the amplitudes of OPs, scotopic b-wave, and photopic b-wave were consistently observed on week 4 after UCCAO, while the amplitude of scotopic a-wave did not dramatically change [8]. Prolonged ischemia, such as permanent ischemia, led to photoreceptor cell degradation, as seen in Stevens et al.'s report of photoreceptors loss 3 months after permanent ligation of both common carotid arteries in bilateral common carotid artery occlusion (BCCAO) [9]. In the HIOP model, the GCL and INL reacted sensitively to ischemic processes. A significant thinning of the GCL as early as 6 hours after 60 minutes of ischemia [10]. Horizontal cells and photoreceptors remained mostly unaffected, while most RGCs and several amacrine cell subtypes disappear [11, 12].

      Our study revealed the changes that occurred within 60 minutes of ischemia and the first 7 days of reperfusion in the UPOAO model. One possibility was that the ischemia duration in our model was not long enough to affect the outer retinal cells. Furthermore, the observation time point for reperfusion was not long enough to see the structure damage and visual dysfunctions in the outer retinal layer. As we have explained in the manuscript, further exploration is needed to understand changes induced by longer ischemia duration and reperfusion periods. Revealing the damage to retinal structure and function during longer ischemia time will be an emphasis direction for our further research.

      (4) Response to recommendation (3):

      Better to only use well-accepted acronyms and remove those that are rarely seen in other publications, such as IMRL, MRL, HIOP, TRT, etc.

      Thank you for your valuable feedback. In our manuscript, we utilized the Spectralis HRA+OCT device (Heidelberg) to capture the retinal images. However, the resulting image layering did not adequately distinguish each retinal layer clearly. To address this limitation, we referred to a clinical OCT stratification approach in RVO and divided the retina into the inner, middle, and outer layers [16]. We acknowledge that this hierarchical description is not commonly used and have therefore followed your recommendation to remove these rare acronyms and instead employ the layer structure abbreviation along with the plus sign. The methods and results have been revised accordingly (line 213, line 368, Figure 4 and Figure S2).

      In addition, for the HIOP model, it is also known as the IR or RIRI model [17-19], and the pathophysiological process of retinal ischemia-reperfusion injury (IRI) is usually used to represent this type of anterior chamber perfusion model. To avoid confusion between the pathophysiological process of ischemia-reperfusion studied in this paper and the common model of high intraocular pressure, we have consistently referred to it as the HIOP model, an abbreviation that is cited in many references [20-22].

      Thanks again for the suggestion. We apologize for any confusion caused by the use of abbreviations and have made the necessary corrections in the manuscript. We have also strengthened the details of OCT layering in the images to enhance readability for our audience.

      (5) Response to recommendation (4):

      Figure 3F, G: What do the OP changes mean? What retina cell dysfunction leads to OP changes? Is there RGC-relevant visual function readout to correlate with RGC death?

      Oscillatory potentials (OPs) are important components of the electroretinogram (ERG). While the precise origin of OPs remains unclear, they are generally believed to be generated from the inner retinal layer, specifically involving bipolar cells, amacrine cells and ganglion cells [23]. OPs are sensitive indicators of retinal ischemic effects and can detect dysfunction before alterations in the b-waves occur [24-26] (We have added these statements at line 358). In this research, the reduction of OPs indicated dysfunction in the inner retinal layer and retinal ischemia.

      The function of RGCs can be non-invasively assessed by using various ERG technique that emphasize the activity of inner retina neurons, including OPs of multifocal ERG (mfERG), photopic negative response (PhNR) in mfERG, pattern electroretinogram (PERG), negative Scotopic Threshold Response (nSTR) [27]. Among these indicators, the PERG appears to be more specifically related to the presence of functional RGCs. However, the complexity of electrophysiological sources and species-specific differences in RGCs characteristics should also be considered. In addition, visual evoked potentials (VEP) can assess the function of visual signaling in the whole visual pathway from RGC axons to the visual cortex of the brain [28, 29]. Unfortunately, due to the unavailability of specific equipment required for evaluating RGCs function, we encountered limitations in conducting a comprehensive assessment in this study. This limitation emphasizes the importance of future studies incorporating RGCs evaluation to provide a more comprehensive understanding of visual pathway functionality and its implications, considering indicators such as PERG and PhNR.

      Thank you for your careful review and insightful questions.

      (6) Response to recommendation (5):

      Figure 4B: RNFL/GCL/IPL normally called GCC (ganglion cell complex).

      We appreciate your helpful recommendation regarding the abbreviation GCC (ganglion cell complex) for the combination of RNFL, GCL, and IPL. We have updated this terminology in the revised manuscript (line 213 and Figure 4).

      (7) Response to recommendation (6):

      Figure 4 A-F: Normally a circular OCT image surrounding the optic nerve head is preferred to measure retina thickness. If in these figures, all the OCT images are from the same location, it may be acceptable, but need to provide imaging details on how these OCT planes are selected and what has been done to make sure the same locations were selected for comparison.

      We agree with your comment on OCT imaging that the retina is usually captured OCT images surrounding the optic nerve head. In this study, our goal was to assess both the thickness of the peripheral retina and the retina near the optic nerve head. To achieve this, we considered the optic nerve head as the apex of the selected field of view (left upper region of panel A in Figure 4). For each mouse, we obtained OCT images of the superior nasal (SN), superior temporal (ST), inferior nasal (IN), and inferior temporal (IT) fields of the optic nerve. We then averaged the thicknesses from these four fields. In each field, we measured and statistically evaluated the retinal thickness at distances of 1.5, 3, and 4.5 papillae diameters (PD) from the optic nerve head.

      This approach allowed us to ensure that the same locations were selected for comparison and provided a comprehensive assessment of retinal thickness across different regions. We have detailed this methodology in the revised manuscript to clarify the imaging process and the consistency of the selected locations.

      Thank you for your insightful feedback.

      Reviewer #2:

      Addressing the following concerns is necessary to improve the manuscript.

      (1) Response to recommendation (1):

      The manuscript contains many grammatical errors and should be carefully reviewed for corrections. For example: In the title, "Silicone Wire Embolization-induced Acute Retinal Artery Ischemia and Reperfusion Model in Mouse: Gene Expression Provide Insight into Pathological Processes". It should be "Provides" instead of "Provide". In the Abstract, "The resident microglia within the retina and peripheral leukocytes which access to the retina were pronounced increased on reperfusion periods." It should be "pronouncedly" or "markedly" instead of " pronounced".

      Thank you for your careful reading and pointing out the grammatical errors in the manuscript. We apologize for these mistakes and have since revised and polished the article with the assistance of native English speakers. Ensuring accurate and clear language usage in scientific writing is crucial, and we appreciate your help in improving the quality of our manuscript. Thank you for bringing these errors to our attention.

      (2) Response to recommendation (2):

      Video 2: the video content from "30s-47s" and "50s-67s" is repeatedly shown.

      Thank you for your careful review of the video. In the process of preparing the external carotid artery for silicone wire embolus insertion, we first ligated the distal end with a square knot and then tied a loose knot at the proximal end. In the video content from "30s-47s" and "50s-67s", we are tying a square knot. We apologize for any confusion caused by these repeated video clips.

      (3) Response to recommendation (3):

      Figure 1: The ConA staining (H-I) and FFA (J-K) were performed before the removal of silicone wire embolus. It would be beneficial to clarify this in the figure legend too. Additionally, the label 'Post. Sup. Alveolar art.: Posterior superior alveolar artery' is not present in Figure 1L."

      Thank you for your thorough review of the manuscript and the valuable suggestions regarding Figure 1. We have updated the figure legend of Figure 1 to clarify that ConA staining (H-I) and FFA (J-K) were performed before the removal of the silicone wire embolus (line 868 and line 873). Additionally, we have included the label 'Post. Sup. Alveolar art' in Figure 1L as you pointed out. We appreciate your careful attention to detail, and we have ensured that these omissions have been rectified in the revised version of the manuscript.

      (4) Response to recommendation (4):

      Figure 2: only representative images of RGCs at the peripheral retina were shown. It is not clear if only RGCs in the peripheral retina were quantified. Is there RGC loss in the central and middle retina in the UPOAO model as well? How many fields of RGCs were quantified for each retina?

      Thank you for your meticulous review of the manuscript. The quantification method of RGCs is described in detail as follows:

      Four radial incisions were made in the retina and flattened on a glass slide to create a "four-leaf clover" shape. Retina was photographed using a fluorescence microscope (BX63, Olympus, Japan). We captured images from three different regions of each retinal quadrant: 0.1 mm-0.5 mm (central region, field numbers: 1, 4, 7, 10), 0.9 mm-1.3 mm (middle region, field numbers: 2, 5, 8, 11), and 1.7 mm-2.1 mm (peripheral region, field numbers: 3, 6, 9, 12) from the optic nerve head, respectively, as shown in Author response image 1.

      Of these, the peripheral field changes were the most noticeable, so we used the Leica SP8 confocal microscope (20X) to capture peripheral field RGCs as a demonstration (Figure 2A, C, E, G). RGC counts of twelve fields of each retina were quantified and the average density of RGCs in twelve fields per retina was shown in Figure 2B, D, F, K. RGC counts in the central (field number: 1, 4, 7, 10), middle (field number: 2, 5, 8, 11), and peripheral (field number: 3, 6, 9, 12) visual fields were shown in Author response table 1-4.We have included this detailed methodology in the revised manuscript to clarify the quantification process and to address the presence of RGCs loss in both the central and middle retina in the UPOAO model. Thank you for pointing out the need for this clarification.

      Author response image 1.

      Schematic diagram of field selection. Scale bar=1.4 mm. Each retinal petal has three distinct visual fields (the area circled by the green line) that radiate from the optic nerve head to the periphery, in that order, the central, middle, and peripheral visual fields.

      Author response table 1.

      RGCs counts in each field of each retina (30-minute ischemia and 3-day reperfusion)

      Author response table 2.

      RGCs counts in each field of each retina (30-minute ischemia and 7-day reperfusion)

      Author response table 3.

      RGCs counts in each field of each retina (60-minute ischemia and 3-day reperfusion)

      Author response table 4.

      RGCs counts in each field of each retina (60-minute ischemia and 7-day reperfusion)

      (5) Response to recommendation (5):

      Figure 3: The representative wave lines in panels A (60min_3d, 60min_7d) and F do not reflect the statistical analysis presented in panels D, E, and G, especially for the amplitudes of b waves and OPs.

      Thank you for your careful review of the manuscript. We've added labels for a-waves, b-waves, and improved the presentation of OPs to make the details of the amplitude more visible (Figure 3). In the previous version, due to incorrect settings, we did not adjust the ordinate spacing when fitting curves of representative wave lines in four groups, resulting in the curves being compressed vertically to the same height. We have now adjusted the curves to be fitted under the same scale bar (shown in the bottom right corner of Figure. 3A). What’s else, we removed the baseline wave of the OPs wave and adjusted the abscissa scale to highlight the N waves and P waves for easy reading (Figure 3F).

      (6) Response to recommendation (6):

      There are two different Supplementary Figure 1 and no Supplementary Figure 3, resulting in misaligned references to Supplementary Figures 1, 2, and 3 in the text.

      Thank you for your careful review of the manuscript. We have reviewed the manuscript again and identified errors in uploading the supplementary figures, which resulted in duplicate Supplementary Figure 1 and the absence of Supplementary Figure 3. We have corrected these issues and realigned the references to Supplementary Figures 1, 2, and 3 in the text to ensure consistency. We appreciate your attention to detail and your reminder to address this issue.

      (7) Response to recommendation (7):

      There is confusion about the definition of ORL (outer retina layer). In Lines 208-209, ORL was defined as the combined thickness of the rest to the retinal pigment epithelium (RPE). It seems the ONL is included in ORL. But in lines 358-359, 907-908, "the ORL encompassed the region from the inner segment/outer segment (IS/OS) to the RPE". Please make the definition consistent. In addition, it is hard to distinguish the regions marked by the green lines in Fig. 4A (sham image) after Line 902.

      Thank you for your careful review of the manuscript. We have addressed the confusion regarding the definition of the outer retinal layer (ORL). The Heidelberg OCT device does not distinguish the layers of the mouse retina well, so we divided it into three broader layers:

      (1) Ganglion Cell Complex (GCC) layer, which encompasses RNFL+GCL+IPL.

      (2) Middle Retinal Layer, which includes INL+OPL.

      (3) Outer Retinal Layer (ORL), which includes ONL+IS/OS+RPE.

      We apologize for the inconsistency and have revised the errors in the manuscript and figure legends accordingly. Additionally, we have removed rare domain-specific acronyms and replaced them with more commonly understood abbreviations, as suggested, to avoid confusion.

      Furthermore, we have enlarged parts of the OCT images to better display the layers, hoping to meet the readers' requirements and improve clarity. Thank you for your valuable feedback.

      (8) Response to recommendation (8):

      Figure 4 (Panels H-J, L-M) incorporated with the text (Line 902) differs from the high-resolution version of Figure 4 included later in the manuscript. In Figure 4 (Panels H-J, L-M) merged with the text (Line 902), the quantification of the IPL and INL thickness is incorrect, and the scale bar is inaccurate. However in the high-resolution version of Figure 4 provided later, the thickness of the RNFL+GCL is incorrect.

      Thank you for your careful review of the manuscript. The quantification of the IPL and INL thickness in Figure 4 (Panels H-J, L-M) incorporated with the text has been revised to ensure accurate measurements and scale bars (Figure 4 and line 924). The high-resolution version of Figure 4 provided later has been updated to correct the thickness measurements of the RNFL+GCL. We have ensured that the ordinate in the high-resolution version of Figure 4 now correctly represents length units, consistent with the equal proportional conversion used in the integrated text figures.

      Thank you for your valuable feedback and for pointing out these errors. We have made the necessary corrections to align the figures accurately with the manuscript.

      (9) Response to recommendation (9):

      Line 384-386: the statement "Notably, a-waves in ERG and the thickness of the outer retinal layers in both OCT and HE remained unchanged." is not accurate, since a-waves in ERG is not changed in 3 days but changed in 7 days, and the thickness of the outer retinal layers in HE is either not measured or not shown in Figure 4.

      Thank you for your careful review of the manuscript. We apologize for this error and have revised it.

      We aimed to convey that the amplitude of the a-waves, which represent the function of the photoreceptors, does not show significant variation, which is consistent with the thickness of the outer retinal layer observed in OCT and HE images. Our results indicated that at 7 days post-injury, the amplitude of the a-waves in ERG was statistically different only at stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2. In contrast, the b-wave amplitude was reduced by half compared to sham eyes at almost all stimulus light intensities. At the same time, the immunofluorescence staining results of photoreceptor cells showed no significant change at 7-days. Therefore, we consider the change in a-wave amplitudes were not significant compared to the significant decrease in b-wave amplitude. We have clarified this in the revised manuscript.

      We also analyzed the thickness of the outer retinal layers in HE and found it to be consistent with OCT results, showing no significant changes (shown in below Author response image 2).

      Thank you for your valuable feedback, which has helped improve the accuracy and clarity of our manuscript.

      Author response image 2.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      (10) Response to recommendation (10):

      Figure 5 and Figure S3: Quantification data from different sections of the same retina should be averaged to represent one single sample (one data point) for statistical analysis. * in images of Fig. 5E, F, I, J is not defined in the figure legend. It would be easier for readers to follow if the GCL, IPL, INL, and OPL were labeled in retinal sections.

      Thank you for your careful review of the manuscript and recommendation. We have reperformed the statistical analysis and updated the results in Figure 5 and Figure S3. In the UPOAO experimental eyes, no no significant change in the number of HCs (Calbindin) was observed during the 3-days reperfusion period, while a notable reduction was observed after 7 days (Figure 5). Additionally, we have added the definition of the asterisks (*) in the figure legend to clarify their significance. We have also labeled the retinal layers, including the GCL, IPL, INL, OPL, and ONL, in the images to make it easier for readers to follow and understand the data.

      Thank you for helping us improve the clarity and accuracy of our manuscript.

      (11) Response to recommendation (11):

      Lines 407-409, the statement "which aligns with the a-waves observed in ERG (Figure 3D, E) and the changes seen in the outer retinal layers in OCT (Fig S2C, D)" is confusing. No changes were observed by OCT in Fig S2D.

      Thank you for your review and we are sorry about the confusion. The overall trend of the amplitude of the a-wave in ERG at 7-days did not change significantly, which is consistent with the immunofluorescence staining results of the photoreceptor cells. Based on these observations, we consider that the change in the amplitude of the a-wave was not significant. As you pointed out in recommendation 9,since a-waves in ERG were changed in 7-days at the stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2, our description on the a-waves in 7-days was not accurate. We have clarified this point in the revised manuscript to ensure it accurately reflects the data presented.

      (12) Response to recommendation (12):

      In Figure S4, panel C shows lymphocyte-mediated immunity, and panel D shows leukocyte-mediated immunity. Please adjust the figure legend accordingly to reflect the figures.

      Thank you for your careful review of the manuscript. We have modified the figure legend of Figure S4.

      (13) Response to recommendation (13):

      Lines 440-442 state "These results suggested early ischemic processions such as cell migration and potential collateral vessel formation." It is not clear why and how "potential collateral vessel formation" is suggested by Figure 6 and Figure S4. Please clarify this in the text.

      Thank you for your careful review of the manuscript and we have deleted this sentence due to insufficient evidence. We have corrected this sentence: "These results suggested that in the early stage of retinal ischemic injury, leukocytes from the microvasculature may infiltrate retinal tissue. More experimental validation will be performed to confirm this hypothesis."(line 448). We will be more cautious in drawing conclusions in the future. Thank you for your reminder.

      (14) Response to recommendation (14):

      For the figure legend of Figure 6 "In each heatmap, upper box showed the top 10 up-regulated genes, and the below one showed the top 10 down-regulated genes." Is this correct? It appears that the upper box shows the top 10 down-regulated genes, and the lower box shows the top 10 up-regulated genes.

      Thank you for your careful review of the manuscript and we have modified the figure legend of Figure 6. In the heatmaps, the upper box showed the top 10 down-regulated genes, and the below one showed the top 10 up-regulated genes (line 977).

      (15) Response to recommendation (15):

      For the figure legend of Figure 7, the statement 'Data points are from retinal sections of four animals' is incorrect, as these data were obtained from whole retinas instead of retinal sections. Please revise the legend to reflect this accurately. The scale bar was absent in the images of Figure 7. Asterisk in Figure 7H and 7I was not defined.

      Thank you for your careful review of the manuscript and we have revised the errors. We have added the scale bar (Figure 7D). The white asterisks in Figure 7H and 7I indicate the activated microglial cells and we have added this definition in the legend of Figure7 (line 981).

      (16) Response to recommendation (16):

      It would be better to switch the order of Figure S7 and Figure S8 to align with their descriptions in the text.

      Thank you for your recommendation and we have switched the order of Figure S7 and Figure S8.

      (17) Response to recommendation (17):

      The gene names in Figure S8 should be written consistently with those listed in Table S1.

      Thank you for your recommendation and we have corrected the gene names.

      (18) Response to recommendation (18):

      In Figure 9, it is not clear why amacrine cells were not included in the UPOAO model, as amacrine cells were also injured as shown in Figure 5I-L.

      Thank you for your careful review of the manuscript and we have added amacrine cells in Figure 9.

      References

      (1) Yang, H., et al., The connective tissue phenotype of glaucomatous cupping in the monkey eye - Clinical and research implications. Prog Retin Eye Res, 2017. 59: p. 1-52.

      (2) Pavlatos, E., et al., Regional Deformation of the Optic Nerve Head and Peripapillary Sclera During IOP Elevation. Invest Ophthalmol Vis Sci, 2018. 59(8): p. 3779-3788.

      (3) Lee, D., et al., A mouse model of retinal hypoperfusion injury induced by unilateral common carotid artery occlusion. Experimental Eye Research, 2020. 201: p. 108275.

      (4) Barthels, D. and H. Das, Current advances in ischemic stroke research and therapies. Biochim Biophys Acta Mol Basis Dis, 2020. 1866(4): p. 165260.

      (5) Smith, H.K., et al., Critical differences between two classical surgical approaches for middle cerebral artery occlusion-induced stroke in mice. J Neurosci Methods, 2015. 249: p. 99-105.

      (6) Janáky, M., et al., Hypobaric hypoxia reduces the amplitude of oscillatory potentials in the human ERG. Doc Ophthalmol, 2007. 114(1): p. 45-51.

      (7) Tinjust, D., H. Kergoat, and J.V. Lovasik, Neuroretinal function during mild systemic hypoxia. Aviat Space Environ Med, 2002. 73(12): p. 1189-94.

      (8) Lee, D., et al., Retinal Degeneration in a Murine Model of Retinal Ischemia by Unilateral Common Carotid Artery Occlusion. Biomed Res Int, 2021. 2021: p. 7727648.

      (9) Yamamoto, H., et al., Complex neurodegeneration in retina following moderate ischemia induced by bilateral common carotid artery occlusion in Wistar rats. Exp Eye Res, 2006. 82(5): p. 767-79.

      (10) Palmhof, M., et al., From Ganglion Cell to Photoreceptor Layer: Timeline of Deterioration in a Rat Ischemia/Reperfusion Model. Front Cell Neurosci, 2019. 13: p. 174.

      (11) Adachi, M., et al., High intraocular pressure-induced ischemia and reperfusion injury in the optic nerve and retina in rats. Graefes Arch Clin Exp Ophthalmol, 1996. 234(7): p. 445-51.

      (12) Jehle, T., et al., Quantification of ischemic damage in the rat retina: a comparative study using evoked potentials, electroretinography, and histology. Invest Ophthalmol Vis Sci, 2008. 49(3): p. 1056-64.

      (13) Hayreh, S.S., H.E. Kolder, and T.A. Weingeist, Central retinal artery occlusion and retinal tolerance time. Ophthalmology, 1980. 87(1): p. 75-8.

      (14) Luo, X., et al., Hypoglycemia induces general neuronal death, whereas hypoxia and glutamate transport blockade lead to selective retinal ganglion cell death in vitro. Invest Ophthalmol Vis Sci, 2001. 42(11): p. 2695-705.

      (15) Schmid, H., et al., Loss of inner retinal neurons after retinal ischemia in rats. Invest Ophthalmol Vis Sci, 2014. 55(4): p. 2777-87.

      (16) Furashova, O. and E. Matthè, Hyperreflectivity of Inner Retinal Layers as a Quantitative Parameter of Ischemic Damage in Acute Retinal Vein Occlusion (RVO): An Optical Coherence Tomography Study. Clin Ophthalmol, 2020. 14: p. 2453-2462.

      (17) Pang, Y., et al., CD38 Deficiency Protects Mouse Retinal Ganglion Cells Through Activating the NAD+/Sirt1 Pathway in Ischemia-Reperfusion and Optic Nerve Crush Models. Invest Ophthalmol Vis Sci, 2024. 65(5): p. 36.

      (18) Feng, Y., et al., GSK840 Alleviates Retinal Neuronal Injury by Inhibiting RIPK3/MLKL-Mediated RGC Necroptosis After Ischemia/Reperfusion. Invest Ophthalmol Vis Sci, 2023. 64(14): p. 42.

      (19) Zeng, S., et al., CREG Protects Retinal Ganglion Cells loss and Retinal Function Impairment Against ischemia-reperfusion Injury in mice via Akt Signaling Pathway. Mol Neurobiol, 2023. 60(10): p. 6018-6028.

      (20) Rosenbaum, D.M., et al., The role of the p53 protein in the selective vulnerability of the inner retina to transient ischemia. Invest Ophthalmol Vis Sci, 1998. 39(11): p. 2132-9.

      (21) Zhang, Y., et al., Melatonin Alleviates Pyroptosis of Retinal Neurons Following Acute Intraocular Hypertension. CNS Neurol Disord Drug Targets, 2021. 20(3): p. 285-297.

      (22) Zhu, J., et al., Protective effects of Erigeron breviscapus Hand.- Mazz. (EBHM) extract in retinal neurodegeneration models. Mol Vis, 2018. 24: p. 315-325.

      (23) Wachtmeister, L., Oscillatory potentials in the retina: what do they reveal. Prog Retin Eye Res, 1998. 17(4): p. 485-521.

      (24) Cao, W., et al., Dextromethorphan attenuates the effects of ischemia on rabbit electroretinographic oscillatory potentials. Documenta Ophthalmologica, 1993. 84(3): p. 247-256.

      (25) Xu, J., et al., Pregabalin Mediates Retinal Ganglion Cell Survival From Retinal Ischemia/Reperfusion Injury Via the Akt/GSK3β/β-Catenin Signaling Pathway. Invest Ophthalmol Vis Sci, 2022. 63(12): p. 7.

      (26)Takács, B., et al., Electroretinographical Analysis of the Effect of BGP-15 in Eyedrops for Compensating Global Ischemia-Reperfusion in the Eyes of Sprague Dawley Rats. Biomedicines, 2024. 12(3).

      (27) Porciatti, V., Electrophysiological assessment of retinal ganglion cell function. Exp Eye Res, 2015. 141: p. 164-70.

      (28) Ridder, W.H. and S. Nusinowitz, The visual evoked potential in the mouse—Origins and response characteristics. Vision Research, 2006. 46(6): p. 902-913.

      (29) Liu, S., et al., An optimized procedure to record visual evoked potential in mice. Exp Eye Res, 2022. 218: p. 109011.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Participants in this study completed three visits. In the first, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a 0-100 visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism. Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres were also explored.

      Results show that in the experimental visits (Visit 2 and 3), when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular. HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974, 1998) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the same scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increased, thus linking these brain regions with pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Strengths:

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. The study has incorporated effective pseudo-randomisation and conducted a rigorous set of statistical analyses to account for as many confounds as possible. I will particularly credit the authors on their analysis which explores the impact of sex and female participants' stage of menses on the study outcomes. It would be particularly interesting for future work to pursue some of these lines of research which investigate the differences in the endogenous opioid mechanism between sexes and the added interaction of stage of menses or training status.

      There are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study. Indeed, their in-depth analysis of many of these areas provides ample support for the claims they make in relation to these specific questions. As such, I consider their evidence concerning the fMRI data to be very convincing (and interesting).

      Weaknesses:

      Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain ratings seem to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected pre-exercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully assume that there is no exercise-induced hypoalgesia effect as there is no true control comparison (a no-exercise condition).

      In more detail, Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect.

      That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a great finding in my mind as anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful/intense and therefore aversive - great news! This likely has many applications within the field of public health.

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner. I consider the overall strength of the evidence to be solid, with the answer to the primary research question still a little ambiguous.

      Reviewer #2 (Public review):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/lowintensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Participants in this study completed three visits. In the first one, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism.

      Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres was also explored.

      Results show that in the experimental visits (Visit 2 and 3) when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular, HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the Borg (1974) scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increase, thus linking these brain regions with the percept of pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW-intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain rating seems to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected preexercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully question whether there is an exerciseinduced hypoalgesia effect as there is no true control comparison (a no-exercise condition). Nevertheless, there are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study.

      I have provided some specific comments for the authors to consider. They are organised to correspond to each section as it is presented, and I have denoted the line I am referring to each time.

      To conclude, thank you to the authors for their work, and thank you to the editor for the opportunity to contribute to the review of this paper. I hope my comments are seen as useful and I look forward to seeing the authors' responses.

      We sincerely appreciate the reviewer's insightful comments, which highlight the strengths of our study. In response to the concerns raised, we have made several key revisions to the original manuscript to address the reviewers’ comments. As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise.

      This reviewer suggests an interesting interpretation of the data suggesting that exercise induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect. Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration. We have now provided a more detailed overview of the pain ratings at different stimulus intensities after HI and LI exercise in both drug treatment conditions for heat and pressure pain ratings. We elaborated on the specific comments raised in more detail in the following sections.

      Specific Comments

      (1) Abstract

      Line 25 - "we were unable to"... personal preference but this wording is a little 'weighted' in my view. I personally do not think researchers search to prove hypotheses correct, rather we search to prove hypotheses wrong, and therefore only through repeated attempts of falsification can we surmise that something holds true.

      We agree with the reviewer that the chosen wording can be perceived as weighted and have rephrased the sentence.

      Line 33 to 35 - the "...but individual factors... might play a role" is a crucial caveat to this sentence for me. Whilst I can understand that the results of the authors' study indicate that prior assumptions about exercise-induced hypoalgesia and its opioidergic mechanisms may be questioned, I think a little more evidence is needed to finally decide whether aerobic exercise has no overall effect on experimental pain responses. (see more in the Results comments below).

      We thank the reviewer for their comment. We agree that no claims can be made regarding the effect of aerobic exercise per se on pain modulation compared to no exercise based on the current data. Furthermore, we agree that more research is needed to further advance our understanding of (non-)opioidergic mechanisms in exercise-induced pain modulation. However, based on the data presented in this study we propose that the involvement of endogenous opioids in exercise-induced hypoalgesia could be influenced by sex and fitness levels since we could show differences in opioidergic involvement between males and females of different fitness levels. Future studies should account for the fitness levels and sex of the sample investigated.

      (2) Introduction

      Line 48 - please predefine anterior cingulate cortex here.

      We thank the reviewer for detecting this and have introduced the abbreviation for the anterior cingulate cortex in the referenced line.

      Line 49 - please predefine periaqueductal gray here instead of line 52.

      We have introduced the abbreviation for periaqueductal grey in the referenced line.

      Line 47 to 54 - when discussing the descending pain modulatory systems, authors seem to be relating specifically to the intensity/magnitude of pain experiences. However, the different brain regions that are mentioned may have varying "roles" according to which dimension of pain is of focus.

      Hofbauer et al. (2001) - https://doi.org/10.1152/jn.2001.86.1.402

      Rainville et al. (1997) - https://doi.org/10.1126/science.277.5328.968

      The two above studies provide some nice earlier findings on the brain regions - some of which are mentioned by the authors in this section - associated with the processing of pain quality in addition to the intensity of pain... simply attach here if they are of interest to the authors.

      The studies by Hofbauer et al. (2001) and Rainville et al. (1997) provide interesting findings on the effect of hypnotic suggestions on pain affect and the perceived intensity of a painful stimulus. However, these studies did not investigate exercise-induced changes in brain regions of the DPMS. The studies referenced in the relevant section of the manuscript are (one of the few) imaging studies that have indeed investigated brain structures of the DPMS in the context of exercise and pain modulation and, thus, were included in this paragraph to focus on the findings of these studies as well as emphasise the scarcity of imaging studies investigating exercise-induced pain modulation. Given these divergent research topics of the proposed studies, we suggest not including them in this paragraph to maintain a clearer line of argument and focus on exercise-induced pain modulation in brain regions of the DPMS.

      L59 to 61 - a minor comment about the phrasing within this sentence and a recommended change is provided below for the flow of the sentence/paragraph.

      "...there are instances where administration of µ-opioid antagonists has decreased exerciseinduced pain modulation (Droste et al. 1988; etc.) whereas in others there has been little effect (Droste et al. 1988; etc.).

      We have altered the sentence based on the reviewers' suggestions to improve the flow and coherence of the sentence.

      L56 to 72 - Whilst the current version of this paragraph scans well enough, I find that the narrative flits between the mechanisms being discussed and the rationale/shortcomings of current research. I think that the original content of this paragraph can be structured into:

      A- The endogenous opioid system is a likely candidate to explain how exercise elicits a hypoalgesia response.

      B- Citation(s) of the imaging studies (Boecker et al., 2008, etc.) and earlier literature which support A (e.g., Janal et al. 1984).

      C- Further support of this theory as µ-opioid antagonists like naloxone seem to counteract the endogenous opioid effect (Haier et al., 1981).

      D- Introduction of the caveats of previous research such as the studies that observed that µ-opioids did not impact the endogenous pain modulation system during exercise (e.g., Droste et al., 1991, etc.) and the range of different interventions and exercise modalities which make it difficult to draw clear conclusions of the pain modulation effect.

      To me, this structure would set out the details you have already put together in a more orderly and systematic way and also will lead nicely into your ensuing paragraph (Line 74 onwards).

      We appreciate the reviewers' constructive comments on structuring this paragraph. We agree that the proposed version eases the readability and comprehension of the paragraph and have, thus, adapted the restructured paragraph according to the reviewer’s suggestion.

      L75 - Why are single-arm pre-post measures and designs an issue? If you can elaborate a little more this would be very insightful for a reader.

      Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once before and once following an intervention. This study design presents some limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs are potentially confounded by the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception. We have now added this to the paper.

      L80 - The reference for the functional threshold power assessment is provided as a number. Please could the authors change to reflect which study/studies they are referring to here (I presume it is the Borszcz and/or the McGrath studies?).

      We apologise for this oversight and have now updated the reference to be displayed correctly. The reviewer is correct in assuming that Borszcz et al. (2018) is the referenced study here.

      L88 - Did participants also receive pressure pain stimulations in addition to the thermal stimuli, as the figure suggests?

      Note Since read on to L102-104 and understood why pressure pain was included but not mentioned due to results. However, I would still recommend including pressure pain stimulations in this line, if possible, to be consistent with what Figure 1 shows and later text in the Methods section also shows.

      We thank the reviewer for their suggestion to mention pressure pain at the referenced line to increase the clarity and consistency of the experimental paradigm. Pressure and heat pain were applied in alternating fashion during scanning. Whilst the results of pressure pain are not included in this study we agree with the reviewer that it should be mentioned again as part of the methods and have added this.

      L94 - I really like Figure 1. Great job.

      Could the authors please define the inter-trial interval (ITI) in the legend? And please could the authors clarify what unit the 30, 50, and 70 figures in the "18 trials per block" section refer to.

      We thank the reviewer for their positive feedback. We have now included a definition of inter-trial-interval (ITI) in the figure legend. Furthermore, we adapted Figure 1 so that the units of the stimulus intensities (30, 50, 70) on the Visual Analog Scale (VAS) are included in the figure allowing for a clearer identification.

      (3) Results

      General comment for figures ... is there a specific reason the authors chose for error bars to be represented by an SE value as opposed to an SD value?

      The reason I ask is that participant responses seem to vary (See Figure 2A and 2E-G as an example). Error bars showing SD values would perhaps do justice to the variability in participant response(s), whereas the SE may be a better representation of the variability in responses due to the assessor's methods of collection. Whilst the SE error bars are narrow (great job on that!), the individual responses are clearly varied which I speculate could be because of the interventions that have been implemented (i.e., exercise intensity).

      The use of Standard Error (SE) is more common in the cognitive neuroscience literature.

      However, as this reviewer noted, we have also included individual data points alongside the SE, thereby providing a comprehensive view that allows for a thorough interpretation of the data distribution.

      L102 to 104 - In fact, it is interesting that exercise did not impact the pressure pain ratings whereas the same cannot be said for thermal pain. In line with some of my comments below about the impact of exercise on pain intensity responses, I would be intrigued to see the results of the pressure pain ratings in more detail.

      Another note on this... Whilst the results for the pressure pain may be beyond the scope of this paper and will be reported separately, knowing of this data is tantalising for a reader. I would suggest to: A) either mention the pressure pain and include the analysis of the data; or B) not mention the pressure pain altogether and save it for the subsequent paper. Either way, I look forward to seeing further discussion on this in future work.

      We have now summarised the behavioural results of exercise on pressure pain ratings below in Supplemental Figure S1.

      There was no hypoalgesic effect evident in the behavioural pain ratings comparing HI to LI exercise in the saline (SAL) condition (β = 0.57, CI [-1.73, 2.86], SE = 1.17, t(1354) = 0.48, P = 0.63; Supplemental Figure S1A, blue bars) as well as no interaction of drug treatment and exercise intensity on pressure pain ratings (β = -1.43, CI [-4.87, 2.01], SE = 1.75, t(2756.02) = -0.82, P = 0.42; Supplemental Figure S1). Post-hoc paired t-tests (Bonferroni-corrected) confirmed there to be no significant differences between the drug treatment conditions at LI (P = 0.18) or HI (P = 0.85) and no significant difference between the exercise intensities in the SAL (P = 0.65) and NLX (P = 0.48) conditions, confirming no significant differences in drug treatment between the exercise intensities.

      Furthermore, there was no significant effect of fitness level on differences in pain ratings (LI – HI exercise) in the SAL condition (β = 3.16, CI [-1.64, 7.97], SE = 2.37, t(38) = 1.34, P = 0.19; Supplemental Figure S1B) and no significant correlation between fitness level and difference pain ratings (r = 0.25, P = 0.13). Finally, there was no significant interaction of drug treatment, exercise intensity, and sex on difference pain ratings (β =-7.97, CI [-18.67, 2.73], SE = 5.51, t(190) = -1.45, P = 0.15; Supplemental Figure S1C-D).

      Exercise did not appear to affect pressure pain ratings and we have now added this to the discussion and in the methods section. However, we think that the figure should be part of the supplements.

      L112 to 113 - Fantastic work for including this analysis in your study. Great job.

      We appreciate the reviewers’ positive feedback on conducting these crucial analyses when investigating sex and gender differences in pain.

      L186 to 189 - It is fascinating that there appears to be no effect of NALOXONE on pain ratings within female participants at a VAS rating of 30 for thermal pain as well as a much diminished hyperalgesia effect at a VAS rating of 50 compared to males. Meanwhile, at higher intensity stimulations corresponding to a VAS rating of 70, females in fact demonstrate a more pronounced hyperalgesia effect compared to males. In addition, the hyperalgesia effect of NALOXONE for males seems to "peak" at a VAS rating of 50. The mechanisms behind these findings alone would be incredibly exciting to explore... but maybe in another study.

      We agree with the reviewer that the differences in males and females are fascinating results and concur that this may hint at varying degrees of opioidergic involvement at different stimulus intensities. This finding is intriguing and potentially clinically relevant, warranting further investigation in future research, although it lies beyond the scope of the current paper.

      L189 - To double check... Figures 4A and 4B refer to the entire cohort (male and female responses combined) whereas C-E are separated by sex?

      In addition, as there are no annotations to the top of Figures 4C-E were no significant differences observed between saline and naloxone conditions per each stimulus intensity? i.e., similar tests to what are shown in Table S6 but separated for each sex.

      Without getting too carried away, there may be something here that indicates a difference between sexes concerning the opioid-driven pain modulation response on a neurological level (i.e., brain region activation).

      The reviewer is correct in assuming that Figures 4A and 4B refer to the entire cohort whilst Fig. 4C – 4E are split for males and females. The full output of the analyses for Fig. 4A and 4B are reported in Supplemental Tables S5 – S7. Furthermore, the full output of the LMER analyses for Fig. 4E is reported in Supplemental Table S10. We agree with the reviewer that additional annotations in Fig. 4C – Fig. 4E ease interpretation and have, thus, added them to the respective figures, denoting the significance of the interaction term stimulus intensity and drug treatment for females (Fig. 4C) and males (Fig. 4D), respectively. For completeness, we now report the post-hoc paired samples t-tests for females and males in the Supplemental Tables S8 and S9, respectively.

      L254 to 258 - "we could not establish an overall hypoalgesia effect of exercise...". Do the results of the exercise intensity x drug treatment provide an answer for this exact hypothesis? After checking the methods section, I cannot seem to find whether the statistical analysis has involved a comparison of the pain ratings after the high (alone), low (alone), or high and low (combined) exercise compared to ratings during control or pre-calibration as part of precalibration (i.e., pain ratings in a rested state without any exercise yet completed).

      We concur with the reviewer's assessment that the study design and statistical analyses cannot address the ‘overall’ effect of exercise compared to no exercise. Please refer back to our general response before comment 1, where we have addressed this point.

      As it seems that the analysis assesses the differences between high and low-intensity exercise, to me, the results of the exercise intensity x drug treatment analysis do not assess whether there is an exercise-induced hypoalgesia effect or not. Instead, it seems to assess whether the intensity of exercise is a differentiating factor in the expected exercise-induced hypoalgesia effect to subsequent pain intensity ratings to experimental pain stimulation. For the authors to judge whether aerobic exercise does or does not have a hypoalgesia effect, then the exercise conditions (either combined or standalone) would have to be compared to a control condition or a data set that involved pain ratings from a pre-exercise timepoint.

      We thank the reviewer for their comment. We would like to point out the we concluded there to be no hypoalgesic effect between the LI and HI exercise based on the LMER model comparing the behavioural pain ratings between the exercise conditions in the SAL condition (β = 1.19, CI [-1.85, 4.22], SE = 1.55, t(1354) = 0.77, P = 0.44; Figure 6A, blue bars and Table S9). The statistical model investigating the interaction of exercise intensity and drug treatment served to show that NLX did not modulate pain differently between the LI and HI exercise conditions.

      Given that our experiment involved different exercise levels in a randomized order, a simple pre vs post analysis is not straightforward. Nevertheless, we have set up a model where we take into account the rating time point (pain ratings provided before each exercise block (prepain ratings) and following each exercise block (post-pain ratings)) at each stimulus intensity (VAS 30, 50, 70) and exercise intensity (LI and HI). The model also takes into account the exercise intensity performed in the previous block, the overall block number as well as the varying subject intercepts. The analysis was completed for heat (Author response image 1A) and pressure (Author response image 1B) pain ratings in the SAL condition to establish whether there was a significant effect of exercise intensity on the changes from pre to post-pain ratings. The model for heat pain yielded a significant main effect for stimulus intensity (β = 1.43, CI [1.34, 1.52], SE = 0.05, t(2054.95) = 31.61, P < 0.001) but no significant interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.14). The model for pressure pain in the SAL condition yielded a significant main effect of stimulus intensity (β = 1.00, CI [0.92, 1.08], SE = 0.04, t(2054.99) = 24.68, P < 0.001) and block number (β = 1.14, CI [0.35, 1.94], SE = 0.41, t(2055.98) = 2.80, P = 0.005) but not interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.38).

      Author response image 1.

      Heat (A) and Pressure (B) pain ratings in the saline (SAL) condition for pre (purple) and post (turquoise) exercise pain ratings at LI and HI exercise and all stimulus intensities (VAS 30, 50, 70). The bars depict the mean pain rating pre and post-exercise and the dots depict the subject-specific mean ratings. The error bars depict the SEM.

      Another point of consideration is that Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      • It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      • It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      • It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect. That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a winner in my mind. Anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful and therefore aversive - great news!

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner.

      As a result of this interpretation of your findings, I do not think that aerobic exercise as a means to cause subsequent hypoalgesia to experimental thermal nociception can be fully discounted. On the contrary, I think your results showed in Figure 6A are evidence for it.

      The reviewer is correct in assuming that Figure 6A shows the averaged pain ratings across all stimulus intensities (VAS 30, 50, and 70) for each subject. To provide more details, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Fig. S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      The reviewer further suggests that the average pain ratings in the SAL condition are lower than the anticipated stimulus intensity, thus, indicating exercise-induced hypoalgesia. While this interpretation is one possibility, there is an alternative explanation: the lower pain ratings may stem from habituation to heat pain (Greffrath et al., 2007; Jepma et al., 2014; May et al., 2012). To support this perspective, we have visualised data from other studies in our lab that have been conducted with the same thermode head and device (TSA-2), using the same calibration procedure and aiming for the same stimulus intensities (VAS 30, 50, and 70). In both studies (Author response image 2A: Study 1: Behavioural sample; Author response image 2B: Study 2: fMRI sample; Author response image 2C: Original Exercise Study), participants did not engage in an exercise task and the pain ratings at VAS 30 and VAS 50 were lower than the anticipated intensities (VAS 30: 11.1/13.4; VAS 50: 35.0/35.9). Furthermore, in a previous study by (Wittkamp et al., 2024), the authors showed that, despite calibrating the heat stimuli at VAS 60, participants rated the pain stimuli with M = 48.58 (SD = 13.79).

      This discrepancy observed between calibrated intensities and ratings provided could be attributable to habituation effects, especially at low-intensity stimuli. Moreover, we would like to point the reviewer to the highest stimulus intensity at VAS 70 (Author response image 2C), where no habituation in all three data sets (including the current study) has taken place. This consistency suggests that exercise-induced hypoalgesia may not be present in our findings or potentially confounded by habituation effects.

      Author response image 2.

      Heat pain ratings at different intensities (30, 50, and 70 VAS) in different study samples. Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      The reviewer further suggests that there is evidence for endogenous opioidergic modulation since the pain ratings in the NLX condition are lower than the anticipated intensities. We fully agree but, again, would argue that the DPMS can exert its effects on painful stimuli in a default manner, i.e. irrespective of any exercise effect.

      We concur with the reviewer’s interpretation that there is no effect of exercise intensity on exercise-induced hypoalgesia since the ratings between both exercise intensities are not significantly different.

      Finally, we agree that our data does not allow for the interpretation of an ‘overall’ effect of exercise-induced hypoalgesia and would like to point out that we did not aim to claim this. Rather, the data suggests there to be no effect of LI vs. HI aerobic exercise on pain modulation. We acknowledge, however, that the phrasing involving ‘overall’ can be misleading and have revised this to focus on the comparison between LI and HI exercise, thereby enhancing precision and clarity.

      Note This is also where it would be really interesting to see the pain pressure data if it were to be included. Mainly to see whether it coheres with what the thermal stimulation stuff shows.

      We have provided the ratings for the pressure pain ratings in the SAL condition below (Author response image 3).

      Author response image 3.

      Pressure pain ratings in the SAL condition at stimulus intensity (VAS 30, 50, and 70). Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      L259 - As mentioned in the comment above. Could the authors distinguish what is being shown in Figure 6A? Are the data presented as the pooled mean for all stimulation intensities? If not, what data is displayed per bar/column?

      We thank the reviewer for their comment. The reviewer is correct in assuming that the bars in Figure 6A depict the pooled means across all stimulus intensities (VAS 30, 50, 70) for each drug treatment condition and exercise intensity. To allow for a more detailed comprehension of the data, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Figure S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      L278 - Can the authors please provide a reference that explains how W.kg-1 at FTP is a measure of fitness level?

      We thank the reviewer for their comment. The obtained FTP value was corrected for the weight of each participant (Watt/kg), yielding a weight-corrected fitness measure that allows for better comparison between subjects. We denoted this in the figures as W*kg-1 which serves to be the equivalent term.

      L296 - Take the line away from Figure 7A... Does the individual data show a positive relation between pain rating changes and W.kg-1? Besides the three data points (1 on the far right of the figure and the two on the far left), I find it hard to see any real trend.

      We acknowledge the reviewers’ concern regarding the regression line and the visual clarity of the individual data points. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      We have conducted an additional LMER model where we have excluded the subjects with the highest and lowest FTP values (sub-28 with 3.19 W/kg and sub-06 with 0.76 W/kg, respectively.) The LMER still yields a significant main effect of fitness level (β = 6.82, CI [1.25, 11.65], SE = 3.18, t(34) = 2.14, P = 0.039; Author response image 4) and a positive correlation between the difference ratings and fitness level approaching significance (r = 0.32, P = 0.057).

      Author response image 4.

      Fitness level on difference pain ratings (LI-HI exercise) without subjects with highest and lowest FTP (N = 37). (A) Subject-specific differences in heat pain ratings (dots) between LI and HI exercise conditions (LI – HI exercise pain ratings) and corresponding regression line pooled across all stimulus intensities in the SAL condition. Fitness level (FTP) showed a significant positive relation to heat pain ratings with a significant main effect of FTP (P = 0.039) on difference ratings.

      (4) Discussion

      L356 to 358 - Exactly. What you write here, I agree with. Your testing allowed you to judge whether there is an effect of aerobic exercise intensity on pain modulation. However, I think this has been a little conflated with the idea that there is "no overall effect of aerobic exercise on pain modulation" in other areas of the article (L358-361, Results, and Abstract). As per my previous comment, I am not sure this (no overall effect) is true.

      We agree with the reviewer and have adapted the manuscript so that the misleading phrase including ‘overall’ is removed.

      L358 to 365 - One addition to this debate about whether this is a hypoalgesia effect of aerobic exercise. In 358 - 361 (particularly the end of 361) there is a strong conclusion that there is no direct involvement of the endogenous opioid system. Then glance onto L364 to 365 and there is then an almost conflicting summary that a hypoalgesia effect driven by opioidergic regions of the brain (and ergo endogenous opioids) is in effect. If there were no direct endogenous opioid involvement, then differences between NALOXONE (blockade of the opioid mechanism) and SALINE conditions would not exist.

      We thank the reviewer for their comment. The structure of this paragraph aimed to guide the reader towards a more nuanced understanding of the possible mechanisms and caveats in exercise-induced pain modulation. Whilst our data suggest an effect of NLX on pain ratings where we showed significantly higher pain ratings in the NLX condition compared to the SAL condition we could not identify an interaction between treatment and exercise intensity. This suggests that there is no significant difference in opioidergic involvement between HI and LI exercise. Our exploratory analyses, however, show an effect of endogenous opioids involved as an underlying mechanism dependant on sex and fitness level.

      My perspective is that an exercise-induced hypoalgesia effect has occurred (based on the data in Figure 6A) but that this effect is certainly caveated by the sex and fitness levels that this study has observed (and kudos for it).

      As mentioned above, based on the current data we cannot untangle whether the reduced pain ratings in the SAL condition are due to habituation to noxious stimuli or an actual hypoalgesic effect of exercise (or potentially a mix of both). However, we fully agree with the reviewer that exercise-induced pain modulation is influenced by fitness level and sex.

      L390 - "endogenous pain modulation through μ-opioid receptors increases with increasing pain intensity". Aside from the general discussion about whether aerobic exercise causes a post-exercise hypoalgesia effect. This finding is also interesting for the pain incurred during exercise in the form of naturally occurring muscle pain and may also be clinically relevant as it could be that the endogenous pain modulation "system" could be primed through repeated exercise as your results show that the fitness level (i.e., a close correlate of how much someone has engaged in exercise and therefore 'activated' the endogenous pain modulation system) is associated with a more pronounced post-exercise hypoalgesia effect.

      This is an interesting aspect. With regards to the pain induced by exercise itself (i.e. muscle pain) we did not gather any data on this type of pain and interpreting this would be mere speculation. However, it is an interesting hypothesis to investigate in future studies whether the pain induced by exercise is potentially influenced by the endogenous opioid system. We agree with the reviewers’ interpretation that repeated exercise might prime the endogenous opioid system, especially in fitter individuals who engage more frequently in exercise and, thus, ‘train’ the endogenous opioid system. We have included this line of interpretation in the original manuscript, where we suggest that the mFC, a brain region with high µ-opioid receptor density, might be ‘trained’ by repeated exercise and, therefore, shows increase activation in fitter individuals after short bouts of exercise.

      L404 to 405 - "a resting baseline does not control for unspecific factors such as attentional load or distraction (Brooks et al., 2017; Sprenger et al., 2012) through exercise." I am not sure I agree. A control condition allows one to truly deduce whether exercise causes a hypoalgesia effect or not. The attentional load may be a factor, but I would argue this is distinct from endogenous pain modulation - unless there is a study that shows cognitive load alone can elicit endogenous opioids like exercise. About distraction, this would be the case if the pain measures were taken during the exercise. However, as the pain measures taken in the MRI were post-exercise and there was no added distraction related to the exercise present anymore, then I do not think any added effect of distraction due to the exercise and its effect on postexercise pain measure is relevant any longer.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation would allow for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. It is important to note that both studies (Brooks et al., 2017; Sprenger et al., 2012) have indeed shown that the effect of cognitive pain modulation is mediated by endogenous opioids.

      L406 - I do not think a low-intensity exercise is a true "control" condition. It certainly does allow the study to compare the dose-response relationship but as the individual is exercising (even at a moderate physiological intensity) then comparison of HIGH vs LOW does not tell us whether exercise does or does not cause hypoalgesia. In contrast, the results from Figure 6A seem to show that even LOW intensity exercise has a hypoalgesia effect and this is a good thing for those who cannot exercise at high intensities (e.g., chronic populations).

      Please refer back to our general response before comment 1, where we have addressed this point.

      L410 - A small digression in relation to the exercise intensities:

      The intensity domains (moderate - heavy - severe) are not truly controlled within this study (mainly for the LOW condition), and therefore some participants could have exercised within different exercise intensity domains than others. To explain, the exercise intensity domains are distinguishable by the physiological responses associated with the boundaries of each of these domains. The FTP is believed to be a demarcation point between heavy and severe intensity domains (though kinesiologists debate the validity of this). Other concepts similar to FTP are Critical Power or the Respiratory Compensation Point. Ultimately, the boundary between heavy and severe intensity domains is characterised by the highest possible intensity by which a steady-state in oxygen kinetics (V̇ O2) occurs (Burnley & Jones, 2018). If this is expressed as a power output (Watts) and then a percentage of this power output is used to prescribe exercise intensity, then the physiological response is not always as expected. The reason is that for some people the gaseous exchange threshold (the demarcation point between the moderate and heavy intensity domains) is not always the same percentage between resting and FTP/Critical Power/Respiratory Compensation Point for each person. As a result, some individuals who are prescribed an intensity of 55% FTP/Critical Power/Respiratory Compensation Point may subsequently exercise within the moderate intensity domain (most people did based on the heart rate and RPE responses) whilst some others might actually exercise more within the heavy intensity domain. A quick check of Figures 3B-C could indicate that this might have been the case for two or three participants, but that is inference and speculation as we cannot truly know unless gas parameters were taken (which is perfectly understandable that they have not been taken because this study has done so much else). However, the importance of this for this study is that if some participants did indeed exercise at a slightly higher physiological intensity, this undermines the LOW condition as a "control" as the physiological stimulus between conditions (Brownstein et al., 2023). It means that the proposed differences in endogenous opioids (Vaegter et al., 2015; 2019) between exercise intensities may not have been present and therefore summarising a lack of an exercise induced hypoalgesia effect is slightly confounded. This is one factor contributing to my scepticism about the conclusion that there is a lack of an exercise-induced hypoalgesia response.

      We thank the reviewer for their comment as it touches upon the challenges of estimating exercise intensities precisely. It is, indeed, crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers such as the Functional Threshold Power (FTP), Critical Power, and the Respiratory Compensation Point (VO2max) (Burnley & Jones, 2018). Previous research has shown that the FTP and FTP20 tests are reliable and convenient methods to estimate approximate measures of VO2max (Denham et al., 2020) and that the FTP test is a useful test for performance prediction in moderately trained cyclists (Sørensen et al., 2019).

      We acknowledge that without direct measurements of VO2max, it is challenging to determine the precise intensity domain in which each participant was operating. While the RPE and HR might suggest that some participants performed in the moderate intensity domain in the LI exercise condition, we could still ascertain there to be a significant difference in the relative power (%FTP), heart rate (HR), and rating of perceived exertion (RPE) between the LI and HI exercise conditions. In the overall sample, the consistency in relative power, heart rate, and RPE responses among participants suggests that the exercise doses were effectively communicated and adhered to; therefore, the validity of the LI exercise condition remains robust.

      While we did not include metabolic assessments in our protocol, our study focused on providing a comprehensive analysis of the exercise-induced hypoalgesia phenomenon across two distinct exercise intensities. Additionally, the rationale for selecting specific exercise intensities was grounded in the existing literature, which indicates significant differences in the hypoalgesic response between exercise intensity levels (Jones et al., 2019; Vaegter et al., 2014).

      According to the reviewer, the potential lack of difference between the exercise conditions might contribute to the fact that there was no difference in endogenous opioid release and, thus, no difference in pain ratings between the exercise conditions. However, our data still suggests that there is an influence of endogenous opioids in the HI exercise condition in males with higher fitness levels. Together with recent findings on the association of µ-opioid receptor activation and fitness levels in men (Saanijoki et al., 2022), as well as the difference in µ-opioid receptor availability between high and moderate aerobic exercise (Saanijoki et al., 2018), we would hypothesise that the release of endogenous opioids after short HI bouts of exercise depend on fitness levels (and potentially sex).

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have, therefore, included this in the discussion of the manuscript.

      L417 - For some reason I am doubting this value (r = 0.61). Could this be checked? I think it is higher in their study. r = 0.88?

      Also, as someone with a kinesiology background, I would argue this is a given anyway. The maximum power one can cycle for 20 minutes is related to the maximum power one can cycle for 60 minutes, this is expected. (That is no slight on the authors of this study, more a remark that readers could look and figure that for themselves if they needed to know).

      We thank the reviewer for their comment. We have carefully re-checked the correlation coefficient between the FTP20 and FTP60 tests in the study by Borsczc et al. (2018) and have corrected the correlation coefficient to r = 0.88. We thank the reviewer for detecting this. Whilst we agree that it seems somehow intuitive that the FTP20 and FTP60 should correlate highly, we wanted to provide the reader with a better understanding of where the FTP20 tests originated from and how it is suitable to assess aerobic fitness levels without having to maintain a steady power output for 60 minutes.

      L428 - Kudos to the authors for taking a standardised approach to this. Hopefully, my comment earlier might provide some extra food for thought about exercise intensity. I think there are several other ways future research could prescribe exercise without the need for expensive and cumbersome bits of equipment to know how hard people are exercising.

      We strongly agree with the reviewer and hope that our study can inspire future research to implement more convenient and inexpensive ways to establish aerobic (and anaerobic) fitness levels.

      L456 to 458 - Would it be possible to revisit this and check whether the pooled mean of all stimulation intensities for pain intensity ratings after pressure pain is lower than 50? If so, I think it can also be assumed that there is a slight hypoalgesia effect occurring for pressure pain too.

      We have revisited the pressure pain ratings pooled across all stimulus intensities (VAS 30,50, and 70). Indeed, the ratings are below 50 VAS (Supplemental Figure S1A) in the SAL and NLX conditions. As mentioned before lower pain ratings after LI exercise cannot be taken as evidence for exercise-induced analgesia.

      L495 to L499 - I find this fascinating. Great finding.

      We thank the reviewer for their positive feedback.

      (5) Methods

      L650 - "Watts"

      We have changed the sentence accordingly.

      L651 - beats per minute can also be represented as b.min-1 and cadence as revolutions.min-1.

      To allow for easier interpretation of the results in a broader readership we would like to propose to maintain the original abbreviations.

      L678 - Just to check what the authors mean by "on the second experimental day", they are actually referring to Visit 2 of 3 (first experimental visit of 2) as it is shown in Figure 1?

      We apologise for the lack of clarity. Indeed, the second experimental day refers to the third visit in the study. We have added this to the sentence to increase clarity.

      L708 - would change the end of the sentence to "and remained blinded throughout the study"

      We have changed the sentence accordingly.

      L742 - comma after "in one participant".

      We have added the missing comma.

      L746 - slight mistype... RPE in brackets instead of PRE

      We have changed the abbreviation to RPE.

      L747 - In case the authors are interested in affective measures in future studies... Hardy and Rejeski (1989) have a 9-point Likert scale rating affective valence which might be useful to check out.

      Thank you. The scale by Hary and Rejeski (1989) is a very relevant measure of affective valence during exercise, and we will consider this in future studies.

      L755 - Four squares for the thermode to be applied were drawn on the arm but through the methods I can only seem to see that the thermode was applied to the second square during calibration. During the MRI scan, did someone move the thermode to different squares for different stimulations?

      We appreciate the reviewers' question. Indeed, the heat calibration and recalibration on the first and second day, respectively, have always been completed on the same skin patch (patch 2) to allow for comparability of calibration across days. During the experimental sessions, the thermode head was repositioned in a randomised order across participants (i.e., skin patch 14-3-2) before each block. This was done manually before the MRI block commenced. The order of thermode head position was kept constant within participants across experimental days (day 2 and day 3).

      L764 - ITI predefined?

      We thank the reviewer for their comment and would like to point to line 130 in the revised manuscript where the abbreviation for inter-trial-interval (ITI) was first introduced.

      (6) Other Sections + Supplementary Materials

      L891 - I apologise in advance for this comment as it is the most trivial comment you will ever receive, but there is an extra "." On this line after J.N. initials for methodology.

      We have changed the punctuation accordingly.

      Table S1 - Strictly speaking, some of the intensity denominations in this table are not exactly an "intensity".

      Iannetta et al. (2020) - https://doi.org/10.1249/mss.0000000000002147 provides a commentary on intensity domains as well as Burnley and Jones (2018) - https://doi.org/10.1080/17461391.2016.1249524

      Likewise in this table - the term "without fatigue" in the description column is not strictly true as participants will naturally fatigue but authors are referring more to a "steady state".

      We have changed the name of the column to ‘Description’ to describe the test phase as proposed by Allen and Coggen (2012) and previously implemented by McGrath et al. (2019) and not the ‘intensity domains’ (as specified by Iannetta et al. (2020)). Further, we have refined the wording in Table S1 and replaced the term ‘without fatigue’ with ‘steady state’.

      Once again, thank you to the authors for their great work on this project and to the editor for the chance to review this paper.

      We would like to thank this reviewer for their very insightful and important comments and for pointing out the strengths of the manuscript. We believe the suggestions will help to improve the quality of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/low intensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      We thank the reviewer for their insightful comments that contribute to improving the quality of the manuscript. In response to the identified weaknesses, we have made key revisions to enhance clarity and rigor. Regarding the lack of a resting control condition, we acknowledge that our study does not assess the overall effect of exercise versus no exercise. Our primary objective was to compare high- (HI) and low-intensity (LI) exercise on pain modulation, hypothesizing that lower intensities would have minimal effects. We revised the manuscript to eliminate misleading phrases about an "overall" effect, clearly emphasizing our aim to investigate the comparative effects of different exercise intensities. To address terminology inconsistencies, we have adopted "exercise-induced pain modulation," reflecting existing literature that recognizes both hypoalgesia and hyperalgesia associated with exercise (Vaegter and Jones, 2020). We clarified this terminology in the introduction and specified the pain modalities used in our study. We also improved methodological transparency by better describing the timing and order of exercise and drug treatment interventions. Concerning exercise intensity estimation, we acknowledge the complexities in classifying moderate, heavy, and severe domains. We added the study by Wong et al. (2023) to discuss the potential limitations of the FTP estimation protocol. Although direct measures of VO2max or blood lactate are absent in our study, our findings, including perceived exertion (RPE) scores and relative power data, support that participants were primarily in the heavy-intensity domain during HI exercise. To clarify RPE ratings, we adjusted the presentation to align with the Borg scale's intended anchor points, ensuring greater accuracy in reported exertion levels. Statistical analyses confirm significant differences in RPE between exercise intensities. These revisions aim to clarify our intent and methodologies, ultimately strengthening the contribution of our research to understanding exercise-induced pain modulation.

      (1) Lines 27-33 - please present some data and accompanying statistical output in the results section of the abstract.

      We thank the reviewer for their comment. In the results section of the abstract, we report whether the findings are (not) significant using the general threshold of P < 0.05. However, we prefer not to include more detailed data and statistical outputs here, as these are thoroughly presented in the results section and do not contribute to the abstract’s primary purpose of providing a concise summary.

      (2) Line 29 - please indicate how fitness level was quantified.

      The functional threshold power (FTP) adjusted for weight served as an indication of cardiovascular fitness level. We have now included this in the abstract.

      (3) Line 35 - please include a sentence detailing the implications of your findings.

      We have now included a sentence on the implications of our findings in the abstract.

      (4) Introduction general - I appreciate that it was an exploratory analysis, however, the introduction does not particularly lay the groundwork for this (e.g., the influence of fitness level, sex, etc) - please include some background within the introduction to establish the role level of fitness/exercise/training/physical activity on pain modulation.

      A paragraph detailing the role of fitness level and sex in the context of exercise-induced pain modulation and endogenous opioid release was part of the introduction of our manuscript but has been removed as per the reviewing editor’s request (as the inclusion of sex and fitness level was not part of the preregistration). We have now re-included a shortened version of this paragraph to provide some background on these potentially crucial factors in exercise-induced pain modulation.

      (5) Lines 40-41 - reference needed.

      We thank the reviewer for detecting this and have now included references concerning the release of endogenous opioids and the term exercise-induced hypoalgesia.

      (6) Lines 48-49 - please provide the full terms for ACC and PAG (PAG has been provided on line 52, but should be presented earlier).

      We thank the reviewer for detecting this. We now introduce the abbreviations for the periaqueductal grey (PAG) and anterior cingulate cortex (ACC) in the correct lines.

      (7) Line 49 - the term exercise-induced pain is often used interchangeably (incorrectly) with many different types of pain experienced during/after exercise (e.g. muscle burn/ache, DOMS, injury etc.). Please see O'Malley et al 2024 (doi: 10.1113/EP091687).

      We thank the reviewer for their comment. Despite the distinction between different types of pain induced by exercise being important, this is less relevant for the current study. We would like to point out that the full term used is exercise-induced pain modulation, referring to the modulation of (experimental) pain through exercise. We have deliberately chosen this term as it summarises exercise-induced hypoalgesia as well as hyperalgesia. Therefore, we did not refer to pain induced by exercise and would disagree that this term has been used interchangeably with different types of pain in the current manuscript.

      (8) Line 57 - neither of these studies looked at exercise-induced pain, rather they examined experimentally induced pain (e.g. cold pressor test) or chronic pain and how exercise might exacerbate it. This leads back to the previous comment - it is important to define what is meant by exercise-induced pain (EIP) from the offset, and then remain consistent in the reference to this.

      We agree with the reviewer and have cited the studies accordingly. We would like to point out that the current study does not investigate exercise-induced pain but the modulation of experimental pain through exercise and have used the term exercise-induced pain modulation consistently in the manuscript to describe this.

      (9) Line 61 - Droste et al and Olausson et al are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Droste et al. (1991) and Olaussen et al. (1986).

      (10) Line 61 - Do you mean exercise-induced hypoalgesia, or modulation of exercise-induced pain - it is not clear? EIH is introduced in Line 40 and in consistent with what the Koltyn study explored. Conversely, Koltyn induced pain using heat and pressure, rather than exercise.

      In this manuscript, we have opted for the term ‘exercise-induced pain modulation’ since previous research has shown that exercise can elicit hypoalgesia as well as hyperalgesia (for review see Vaegter and Jones (2020)). Thus, the term refers to the modulation of pain through exercise. We have now included a sentence detailing the use of the term ‘exercise-induced pain modulation’ in the first passage of the introduction. Corresponding to Koltyn et al. (2014), we have used heat and pressure stimuli to induce pain and investigate the modulating effect of different exercise intensities on these pain modalities.

      (11) Line 62 and 64 - Both the Janal study and Haier study are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Janal et al. (1984) and Haier et al. (1981).

      (12) Line 62 and 64 - define long/short distance/duration.

      We have revised the terminology from "short-duration" to "short-distance" to facilitate a more precise comparison of the exercise protocols employed in the studies by Janal et al. (1984) and Haier et al. (1981). Specifically, the long-distance run conducted by Janal et al. (1984) spanned 6.3 miles (10.3 km), while the short-distance run executed by Haier et al. (1981) covered 1 mile (1.6 km).

      (13) Line 62 - what type of pain?

      Janal et al. (1984) implemented thermal, ischemic, and cold pressor pain in their study and observed a hypoalgesic effect in response to thermal and ischemic pain that was reversed under NLX administration. We have now specified this in the text.

      (14) Line 67 - please place "i.e., the insula, ACC and prefrontal regions" in parentheses.

      Done.

      (15) Lines 67-69 - please provide clarity on the nature of the interventions being employed. For example, are you referring to interventions to reduce/overcome pain? Or are you referring to approaches to experimentally induce or increase pain during exercise? In either case, please be specific on the interventions employed, and why this variation in approach may make it challenging to draw a conclusion

      The interventions employed by several studies aimed to investigate the pharmacological underpinnings of the pain modulatory effect of exercise and were, thus, pharmacological interventions. The primary objective of these interventions is usually not to reduce/induce/decrease/increase pain but to block a specific receptor type to infer the involvement/role of these receptor types in pain modulation through exercise. In the context of exercise and pain specifically, the most frequently used pharmacological intervention consists of administering a µ-opioid receptor antagonist (naltrexone/naloxone (NLX)). Depending on which type of µ-opioid receptor antagonist is used, different administration protocols are employed (i.e., oral or intravenous administration, different doses, only bolus without constant injection). This variability in the administration protocols of these pharmacological interventions can account for different findings of the extent of opioidergic involvement in exercise-induced pain modulation. We have now refined the according section to increase the precision and clarity of the interventions used.

      (16) Line 69 - administration of what?

      This passage refers to the variability of administration of µ-opioid receptor antagonists such as naloxone (NLX) or naltrexone. We have now specified this in the according line.

      (17) Line 74 - EIH?

      As described above, we have chosen the term 'exercise-induced pain modulation' as an umbrella term for both exercise-induced hypoalgesia and hyperalgesia. However, the reviewer is correct that specifically studies investigating exercise-induced hypoalgesia have been criticised. Still, the proposed criticism also applies to studies detecting hyperalgesia and we would, thus, argue to retain the term ‘exercise-induced pain modulation’ here for the sake of consistency.

      (18) Line 75 - please define "single-arm pre-post measurements"

      We appreciate the reviewers' comment. Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once prior to and once following the intervention. This study design presents several limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs do not consider the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Consequently, when measuring pain levels with only one pre- and one post-intervention assessment, there is a risk of misinterpreting the outcomes where a reduction in post-intervention pain ratings might erroneously be credited to the exercise intervention itself, rather than being a result of habituation to the noxious stimuli experienced. Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception.

      (19) Line 84 - is (40) a reference?

      We apologise for this oversight and have now updated the reference by Borszcz et al. (2018) to be displayed correctly.

      (20) Line 86 - is that 10 min per block (i.e. 40 min exercise time), or 10 min in total? If the former please include "per block" at the end of the sentence (Line 87).

      The reviewer is correct in assuming that we employed 10 min of cycling per block, resulting in a total of 40 minutes of cycling. We have updated the sentence now including ‘per block’ as suggested by the reviewer.

      (21) Line 89 - when you refer to "painfulness" are you referring to the intensity of pain experienced? If so, I think "pain intensity" would be more appropriate.

      In the current study, participants were asked about the ‘painfulness’ of each stimulus based on previous studies (Horing et al., 2019; Horing & Büchel, 2022; Tinnermann et al., 2022). The term ‘painfulness’ is a composite measure of ‘pain intensity’ (sensory dimension) and ‘pain unpleasantness’ (affective dimension) (Talbot et al., 2019). Since unpleasantness is also a definitional criterion of pain (‘Terminology | International Association for the Study of Pain’, n.d.) and previous research shows a high correlation between ‘pain unpleasantness’ and ‘pain intensity’ (Granot et al., 2008; Talbot et al., 2019) we have opted for the term ‘painfulness’ as a more comprehensive measure. Inherently, these two measures are highly correlated.

      (22) Line 91-93 - the way this is written could be suggestive of this being separate to the cycling blocks. Please rephrase to confirm that this was administered prior to the commencement of the cycling blocks.

      We have refined the sentence to make it clearer that the drug treatment was administered before the cycling block commenced on each of the experimental days. We would like to further specify, that whilst the bolus dose of the treatment was administered prior to the experiment, a constant intravenous supply of SAL/NLX was maintained throughout the experiment using an infusion pump.

      (23) Methods general - why only 10 min of exercise? It is likely that there is a 'dose effect' of exercise on EIH, whereby the intensity of exercise and the duration of the exercise are important. Short-duration but high-intensity exercise can induce EIH, as can moderate duration low-intensity exercise. But, for this protocol, was the intensity high enough or long enough to meet the 'dose' needed?

      We thank the reviewer for their question. Our decision to employ 10-minute exercise blocks was rooted in both scientific evidence on exercise-induced hypoalgesia and the (clinical) applicability of the findings. Research has shown that exercise durations ranging from 8 minutes to 2 hours of aerobic exercise can induce hypoalgesia (for review see Koltyn (2002)). Specifically, several studies induce hypoalgesia at 10-15 minutes of aerobic exercise (Gomolka et al., 2019; Gurevich et al., 1994; Haier et al., 1981; Jones et al., 2019; Sternberg et al., 2001; Vaegter et al., 2015). Furthermore, many prior studies have employed exercise durations that are tailored to professional or amateur athletes which may not be practical for healthy individuals with lower fitness levels who may find it challenging to engage in longer sessions, such as an hour of running. When considering applying these findings to the clinical chronic pain population it is crucial to assess the manageability of proposed exercise protocols. We believe that 10 minutes of exercise, whilst being a relatively brief exercise duration, may still be sufficient to elicit exercise-induced hypoalgesia.

      (24) Methods general - what was the time gap between each round (i.e. after the fMRI, how long before the participant started the next cycling block?).

      After each fMRI run the participants were taken out of the MR scanner. The HR and SPO2 were measured and participants were given the chance to go to the restroom before positioning them on the bike and starting the next block. All in all, the time following the fMRI scan and before the new block commenced ranged between 5-10 minutes. We have now included this specification in the methods section.

      (25) Methods general - there is some evidence to show that the EIH effect is less consistently shown when heat is used to induce pain - was there a reason heat was used as the pain induction method here?

      We thank the reviewer for their comment. Indeed, previous meta-analyses by Naugle et al. (2012) report larger effect sizes for pressure pain (Cohen’s d = 0.69) closely followed by heat pain (d = 0.59). In light of this evidence, we included both pain modalities in the current study. Notably, we found no significant differences in pressure pain responses between LI and HI exercise. It is important to emphasise that the term "pressure pain" predominantly encompasses studies employing handheld pressure algometry, whereas our investigation utilised a pressure cuff. This methodological variation raises the possibility that our findings—and corresponding effect sizes—may not be directly comparable to prior pressure pain studies.

      (26) Methods general - please be consistent in the use of terminology. In some areas, you use the phrase "cycling block" whereas in other areas it is referred to as a "cycling run".

      We have revised the methods section to be more precise with the terms ‘run’ and ‘block’.

      (27) Line 571-573 - Please detail how participants were excluded based on scores from STAI and BDI-II.

      We apologise for the misspelling, as it should be that one participant was excluded based on a BMI (body mass index) below 18. No participant had to be excluded based on the STAI or BDI-II score in the current study. We have corrected this in the manuscript.

      (28) Line 636-651 - the FTP20 test has been shown not to be a valid marker of the separation between the heavy and severe exercise intensity domains (see Wong et al 2023 - https://doi.org/10.1080/02640414.2023.2176045). Given that participants completed the high intensity cycle in 'zone 4' (91-106% of FTP), it is probable that participants could have completed this 10 min in either the heavy or the severe exercise intensity domains, with significant implications for the relative challenge this 10 min of exercise. Why was zone 4 used? What are the implications of this? Please discuss and include this as a limitation.

      We thank the reviewer for their comment as it touches upon the challenges of accurately estimating exercise intensities. It is indeed crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers.

      The study by Wong et al. (2023) is interesting; it assesses blood lactate and VO2 levels at FTP and FTP+15 Watts. Despite being highly relevant for the field some of the findings should be interpreted with caution due to the low sample size of 13 participants, consisting of 11 male and only 2 female cyclists, which may limit generalisability. Additionally, the testing protocol implemented in the study to determine participants' FTP consisted of a 5-minute self paced pedalling at 100 Watts followed by a 20-minute maximal, self-paced time trial. This differs from the FTP20 test as implemented in the current study (see Supplemental Table S1) or by other studies (McGrath et al., 2019). The finding in Wong et al. (2023) that participants were only able to sustain cycling at FTP for an average of 33 minutes suggests that the deviating protocol overestimates FTP. Mackey and Horner (2021) propose that the validity of the FTP20 test might rely on the warm-up used before FTP20 testing and the training status of athletes.

      However, we acknowledge that without direct measurements of VO2max or blood lactate levels, it is challenging to determine the precise intensity domain in which each participant was operating in the current study. Still, the RPE (low: M = 8.59, SD = 1.32; high: M = 14.92, SD = 1.98) suggests that participants operated in the heavy-intensity domain in the HI exercise condition. This is further supported by the relative power (%FTP) maintained in the HI (M = 105; SD = 0.05; Author response image 5, purple) and LI (M = 58; SD = 0.06; Author response image 5, green) exercise conditions (difference: t(37) = 44.58, P < 2.2e-16, d = 6.46) confirming the accuracy of the implemented FTP test as well as the maintained power throughout the cycling blocks. Thus, we would argue that participants in the current study predominantly exercised the heavy domain during the HI exercise condition. We have included the relative Power in Figure 3A, replacing the absolute Power.

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have now included a discussion of the validity of the FTP20 test as a demarcation point concerning the intensity domains.

      Author response image 5.

      Raincloud plot of relative power (%FTP) during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks.

      (29) Line 676 - please provide further information on each cycling run/block. Did each participant complete a total of 4 runs (i.e., a total of 40 minutes of exercise), with 2 runs completed at a high intensity and 2 runs completed at a low intensity in a randomised order (e.g., for one participant this could be 10 minutes at low, followed by 10 minutes at high, followed by 10 minutes a low, followed by 10 minutes at high)? Figure 1 details this nicely, however, it would be helpful to read in-text.

      The reviewer is correct in assuming that there were a total of 4 blocks on each experimental day. Participants completed cycling in 2 blocks at HI and in 2 blocks at LI in a pseudorandomised order. This order was kept constant across experimental days (i.e. completing the same block order on Day 2 and Day 3). We have detailed this further in the Methods section.

      (30) Discussion general - it is possible that EIH could be induced via different mechanisms and that these mechanisms are at least in part due to exercise intensity. For example, EIH from higher-intensity exercise might have some contribution from CPM.

      We thank the reviewer for their comment. Previous research aimed to disentangle the two seemingly similar mechanisms of exercise-induced hypoalgesia (EIH) and conditioned pain modulation (CPM) (Ellingson et al., 2014; Rice et al., 2019; Samuelly-Leichtag et al., 2018; Vaegter et al., 2014). CPM is typically induced by applying a tonic noxious stimulus that decreases pain sensitivity to another noxious stimulus applied simultaneously or shortly after at a distant body part (Graven-Nielsen & Arendt-Nielsen, 2010). Despite EIH and CPM showing distinct mechanisms, it cannot be completely ruled out that there are at least partially overlapping mechanisms driving the two phenomena (Rice et al., 2019). Due to our study design, where the time difference between cycling blocks and the applied pain was on average five minutes, it is unlikely that CPM is the driving pain modulatory mechanism in our study setup.

      (31) Line 101 - as this was preregistered, should the study design be followed and then reported?

      We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (32) Line 110 - please provide some data on the fitness levels and how this is classified as high/low.

      The FTP (relative to body weight) was used as an estimate of cardiovascular and endurance fitness (Valenzuela et al., 2018). We refrained from classifying the fitness levels dichotomously as low or high since this is a subjective measure in a sample of healthy individuals of diverse fitness levels. Instead, we utilised the FTP as a more nuanced metric for comparison.

      (33) Lines 159-160 - in the context of the difference in intensity between the sessions. But, it is likely that the high-intensity exercise would have posed quite different relative challenge between participants.

      We thank the reviewer for their comment. As described above, we did not obtain direct measurements of VO2max or blood lactate levels making it challenging to determine the precise intensity domain in which each participant was operating in the current study. However, all participants received the same instructions to the BORG rating scale ensuring the comparability of RPE across participants to a certain extent.

      (34) Figure 3C - what instructions and familiarisation were given to participants regarding the 6-20 Borg scale? In Figure 3C it looks as though several participants rated the low exercise intensity at 6. This would/should be equivalent to sitting quietly, so it looks as though at least several participants did not understand how to use the RPE - please discuss.

      Indeed, three participants rated the LI exercise condition at 6 due to an error in the translation of the scale instruction. Participants were instructed that the lower anchor point of the scale (6) referred to ‘extremely light’ instead of ‘no exertion’. Thus, we have rescaled the RPE ratings where a rating of 6 now corresponds to a 7 (‘extremely light’) on the BORG scale and again calculated the paired t-test. There is still a significant difference in the RPE between exercise intensities (t(38) = 19.65, P < 2.2e-16, d = 3.69; Author response image 6). We have corrected this in the manuscript accordingly and updated Figure 3C.

      Author response image 6.

      Raincloud plot of rating of perceived exertion (RPE) on the BORG scale during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks. A rating of 6 reflects ‘no exertion’ and 20 reflects ‘maximal exertion’.

      (35) Line 171 - is (37, 38) a reference?

      We apologise for this oversight and have now updated the references to be displayed correctly.

      (36) Line 176-18 - is this interaction sufficiently powered? Differences between sexes are not mentioned in the pre-registered study

      We have conducted an additional post-hoc power analysis for the interaction of drug, fitness level, and sex on differential heat pain ratings. We employed the power analysis for mixed models implemented in R (powerCurve) with 1000 simulations. This revealed that with a power of α = 0.8, a sample size of n = 27 would have been sufficient to detect this effect (Author response image 7). Despite not having preregistered the factor ‘sex’, we believe that the observed results provide valuable insights that contribute to a deeper understanding of the data. We have established these analyses to be exploratory, emphasising the need for caution in their interpretation. However, we feel it is essential to report these findings to inform future studies, ensuring that such factors are adequately considered.

      Author response image 7.

      Post-hoc power analysis for behavioural effects from the linear mixed effects (LMER) model with interaction drug, fitness level, and sex using the R package powerCurve with α = 0.8 and 1000 simulations.

      (37) Line 227 - this is not what this analysis shows. The comparison is low vs high-intensity exercise on pain modulation, not exercise vs. no exercise. You cannot conclude that aerobic exercise has no effect on pain modulation because you did not do that comparison (i.e. no baseline (without exercise) for pain).

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (38) Methods General - why was a control condition not used, or at least a baseline pain response, so that low/high-intensity exercise could be compared to a baseline? Given this, I'm not sure I agree with the study conclusions (abstract: 'These results indicate that aerobic exercise has no overall effect on pain in a mixed population sample') because you have compared high vs low-intensity exercise, not exercise vs. no exercise.

      As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise. This reviewer suggests an interesting interpretation of the data suggesting that exercise-induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect.

      Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration.

      (39) Line 285 - or that better-trained individuals have a greater EIH response to higher intensity exercise, but both those of low and high fitness have established EIH after low intensity exercise. Given there isn't a 'no exercise' baseline, it is hard to make conclusions about EIH effect generally, only comparisons between high/low exercise intensity.

      We thank the reviewer for their comment. We agree that we cannot establish whether all participants showed a hypoalgesic response to the LI exercise with the current study design. However, our results show that participants with higher fitness levels showed increased hypoalgesia after HI exercise compared to those with lower fitness levels. We have refined the sentence accordingly.

      (40) Figure 7A - the regression line here is not that convincing.

      We acknowledge the reviewers’ concern regarding the regression line. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      (41) Line 354 - the NLX infusion was double-blind, but what are the implications of participants knowing that they completed high/low-intensity exercise - this cannot be blinded.

      The reviewer is correct that the exercise intensities cannot be blinded. To account for potential expectation effects of exercise on several psychological and physiological domains (including pain), participants completed a questionnaire on the calibration day where they had to indicate their expectations of to what extent acute exercise affects several domains (Lindheimer et al., 2019). They could rate each domain on a Likert scale ranging from ‘large decrease’ (-3) to ‘large increase’ (3) with 0 denoting ‘no effect’. This format was chosen to allow measuring the direction and magnitude of expectation effects and to avoid being directive or suggestive (Lindheimer et al., 2019). Despite including other psychological and physiological domains in the questionnaire (i.e., stress, anxiety, energy, memory) we focused on the specific pain domains (muscle pain, joint pain, and whole body pain) to establish participant’s expectations regarding the effect of acute exercise on pain. We tested whether the expectation ratings for each pain type were significantly different from 0 (no effect) using a one-sample t-test.

      There was no significant effect for muscle pain (t(38) = 1.78, P = 0.08, M = 0.39, SE = 0.12), joint pain (t(38) = -0.12, P = 0.90, M = -0.03, SE = 0.11), or ‘whole-body pain (t(38) = -1.05, P = 0.30, M = -0.21, SE = 0.12) suggesting there to be no expectation effect on these pain domains in the overall sample (Supplemental Figure S10A). Since there is variation in the data we calculated the correlation of the expectation ratings in the different pain domains with the difference score between the pain ratings in the SAL condition (LI – HI rating; Supplemental Figure S10B). This analysis yielded no significant correlation in either of the pain domains (joint pain: r = 0.11, P = 0.49; muscle pain: r = -0.07, P = 0.68; whole-body pain: r = 0.07, P = 0.68).

      Moreover, given that we have not been able to show a difference between the exercise intensities on pain modulation, expectation effects are likely not to contribute to this null effect.

      (42) Line 356-358 - and this comparison (and primary hypothesis) is not blinded.

      While we agree with the reviewer that this comparison is not – and potentially cannot be – blinded, we would like to reiterate our results from the previous paragraph that indicate that such expectation effects of exercise on pain were not present in the sample and, thus, did not seem to have influenced the results. It is noteworthy that the double-blind design of our study design specifically pertains to the pharmacological intervention employed.

      (43) Line 358-360 - this could be explained by both types of exercise inducing EIH via the same mechanism (which is disrupted by NLX).

      We thank the reviewer for their comment and would like to refer back to the reviewer's comment number 38 for a response to this.

      (44) Line 360-361 - this conclusion cannot be drawn, because you have only compared high vs low intensity exercise. So, the conclusion should be 'These results suggest that there is no difference between high and low aerobic exercise intensity on heat-induced pain'.

      We agree with the reviewer and have rephrased the sentence to reflect the claim accurately.

      (45) Line 396 - as previously discussed, this conclusion cannot be drawn through this study design.

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (46) Line 399 - please expand on this point - it is critical to the hypothesis and should also be included in the introduction. What intensities/duration/dose of aerobic exercise is generally established to cause EIH?

      We thank the reviewer and agree that this is a crucial aspect that requires further specification. Below we have expanded on the duration/intensities shown to elicit exercise-induced hypoalgesia and included a concise version of this detailed paragraph in the manuscript introduction.

      For aerobic exercise, different methods have been employed to determine exercise intensity levels i.e., through the VO2max, age-predicted HRmax, or incremental intensities (Koltyn, 2002). Most studies using VO2max as a measure of exercise intensity (Koltyn et al., 1996; Micalos & Arendt-Nielsen, 2016; Vaegter et al., 2014) were able to induce hypoalgesia with HI levels ranging between 65%-75% VO2max. When using the HRmax as a measure of determining exercise intensities, HI exercise at 70%-75% of the HRmax has been shown to produce greater hypoalgesia compared to moderate intensity at 50% HRmax (Naugle et al., 2014; Vaegter et al., 2014). Furthermore, previous research has suggested that HI exercise produces greater hypoalgesia compared to LI exercise (60-70% HRmax vs. light activity: M. D. Jones et al., 2019; 70% vs. 50% HRmax: Naugle et al., 2014; 75% vs. 50% VO2max: Vaegter et al., 2014).

      Furthermore, different durations can be regarded as suitable with durations between 8 minutes to 2 hours of aerobic exercise having been shown to induce hypoalgesia (for review see Koltyn (2002)). Hoffman et al. (2004) showed a hypoalgesic response after 30 minutes but not after 10 minutes at 75% VO2max of cycling. In contrast, other studies were able to induce hypoalgesia at 10-15 minutes of HI aerobic exercise (75% VO2may: Gomolka et al., 2019; 63% VO2max: Gurevich et al., 1994; self-paced: Haier et al., 1981; 60-70% HRmax: Jones et al., 2019; 85% HRmax: Sternberg et al., 2001; 75% VO2max: Vaegter et al., 2015).

      (47) Line 400-401 - please define high intensity.

      We thank the reviewer for their comment. The referenced studies by Vaegter et al. (2014) and Jones et al. (2019) based the estimation of HI and LI exercise on an age-related target heart rate corresponding to VO2max and HRmax, respectively. In Vaegter et al. (2014), the HI condition corresponded to 75% VO2max, while the LI to 50% VO2max. In Jones et al. (2019), the HI exercise condition corresponded to 60% and 70% of HRmax, while the LI condition was defined as pedalling slowly against a light resistance of 0.5 kg of force to maintain a rating of perceived exertion (RPE) not above resting. We have included this clarification in the relevant section to elucidate the intensities of the chosen exercise conditions.

      (48) Line 403-405 - I'm not sure I follow (perhaps I have misunderstood) - pain induction was completed after exercise in the MRI scanner, so there was no distraction effect of exercise in either condition. A baseline could have been established in the same way and there would be exactly the same conditions, just without prior exercise.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation allows for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. Nevertheless, it is important to note that previous studies (Brooks et al., 2017; Sprenger et al., 2012) have shown that cognitive pain modulation is mediated by endogenous opioids. Therefore, tasks with different attentional loads potentially influence post-task pain ratings. Although, we agree with the reviewer that the effect of distraction or attentional load would be minimal in the MR scanner, there still could be an effect of different cognitive loads from exercise vs. no exercise. Nevertheless, we focus the discussion on investigating the dose-response relationship between different exercise intensities where an ‘active’ control condition might contribute to a more nuanced understanding of exercise-induced pain modulation.

      (49) Line 403-411 - this is fine (although I do not agree that this was the best methodological decision), however, it does limit the conclusions that can be drawn (as previously mentioned). That is, you cannot conclude that no EIH occurred, only that there was no difference between low and high-intensity exercise in post-exercise pain response.

      We agree with the reviewer that the comparison of HI vs. LI exercise does not allow for an interpretation of the overall effect of exercise as opposed to no exercise on pain modulation. The comparison of HI and LI exercise allows the investigation of a dose-response relationship of these distinct exercise intensities. While LI exercise might not be a 'pure' control condition in the traditional sense, it is valuable for exploring the complexities of exercise and pain interaction.

      (50) Line 419-422 - sorry I do not follow - you say that moderate intensity exercise most reliably induces EIH but then select exercise intensities that are likely to be in the heavy or severe intensity domain? Please also include in this discussion the limitations of FTP20 as a threshold marker (see Wong et al) and the implications on the results/conclusions.

      We thank the reviewer for their comment. In the referenced sentence, we have defined the HI exercise as described in the reviews. Specifically, Wewege and Jones (2020) reported hypoalgesia to be greater after higher-intensity exercise, although the intensity was not further specified. Naugle et al. (2012) noted that HI exercise (i.e., 75% of VO2max) produced greater hypoalgesia, while Koltyn (2002) indicated that hypoalgesia occurs at intensities ranging from 60% to 75% of VO2max but more reliably at 75% VO2max or higher. Consequently, we have removed the term ‘moderate’, as it does not accurately reflect what has been reported in the reviews and could be misleading. Moreover, we have clarified the specific criteria for what is considered high (or higher) intensity exercise in the referenced reviews.

      We kindly ask the reviewers to refer back to the previous comment (reviewer comment number 28) regarding the discussion of the intensity domains and the FTP20 test as demarcation point for these intensity domains.

      (51) Line 422-425 - indeed, pacing is an important element of this test, which inexperienced cyclists have difficulty with when they are not provided with proper familiarisation.

      We agree with the reviewer that the FTP20 test has mainly been validated and employed in experienced cyclists and requires further validation in non-athletes of both sexes. However, since we have used an extensive warm-up period and several paced steps (intervals, 5-minute time-trial) as well as recovery periods (Supplemental Table S1) based on McGrath et al. (2019) we propose that participants were thoroughly familiarised with the elements of pacing before the estimation of the FTP in the 20-minutes took place. On average, participants showed a variation of M = 21.80 Watts (SE = 1.44 Watts) during the 20-minute paced FTP20 test (Supplemental Figure S11A). Interestingly, our data suggests that participants with a higher FTP showed higher variation of power output (Watts) during the 20-minute FTP test compared to individuals with lower fitness levels (Supplemental Figure S11B).

      (52) Line 425-427 - please remove this, the RPE difference between exercise bouts is not evidence that participants cycled at FTP.

      We thank the reviewer for their comment. However, we would propose to include the rating of perceived exertion (RPE) since it shows that the exercise intensities have been perceived as significantly different by the participants. This behavioural measure of exertion is potentially important for a broader audience to understand the exercise implementation beyond physiological markers.

      (53) Line 432 - high vs. low-intensity aerobic exercise

      We have changed the sentence accordingly to support the claim of the study that there was no difference in exercise-induced pain modulation between HI and LI aerobic exercise.

      (54) Line 447-449 - this seems contradictory to the first line of this paragraph (430-432) - i.e. that the heterogenous sample may have caused the null finding. Why deliberately select a participant sample that is likely to lead to a null effect?

      In the current study, we aimed to include participants of diverse fitness levels and both sexes to verify the findings on exercise-induced pain modulation in a broader population. We consider this important concerning translational aspects of EIH. Indeed, our heterogeneous sample may have ‘caused’ the observed null effect, but at the same time, it suggests that more homogenous (sometimes composed solely of male athletes) samples employed in many earlier studies might have skewed the understanding of exercise-induced pain modulation and thus unintentionally suggested a (non-existing) generalisation of this effect to the general population.

      (55) Line 532-456 - although Koltyn found electrical pain to have the greatest effect?

      The review by Naugle et al. (2012) reported effect sizes for heat (Cohens d = 0.59) and pressure pain intensity (d = 0.69) following aerobic exercise but did not provide effect sizes for electrical pain intensity. They noted that the effect size for electrical pain intensity after isometric exercise was d = 0.40, which is lower than that for heat and pressure pain. While Koltyn (2002) stated that electrical and pressure stimuli induce exercise-induced hypoalgesia more consistently than thermal pain, the study did not clarify whether this applies to pain threshold, intensity, or tolerance, nor did they provide effect sizes. Given that electrical, pressure, and heat pain are the most commonly used methods to induce quantifiable pain in the context of exercise studies (Vaegter and Jones, 2020), we based our decision to use heat and pressure pain primarily on Naugle et al.'s findings.

      (56) Line 468-469 - why leave out content that was pre-registered (i.e. difference between pressure and heat pain) but includes analysis that wasn't (i.e. sex differences)? If a study is going to be pre-registered, then isn't it important to follow that design?

      We thank the reviewer for this comment. We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (57) Line 532-525 - and how could this have been accounted for?

      We apologise for any confusion, as we are unsure about the specific reference the reviewer is making based on the provided line numbers. We believe the question relates to how the potential effects of endocannabinoids were considered in the current study design, and we've addressed that in our response. In human studies, it is not possible to centrally block endocannabinoids, which makes it difficult to directly estimate their role in exercise-induced pain modulation in humans. Measuring endocannabinoids in the blood might not adequately capture changes in endocannabinoid levels in the brain throughout the different exercise intensity conditions. Despite these limitations, exploring the role of endocannabinoids in exercise-induced pain modulation presents a promising avenue for future research that could enhance our understanding of pain mechanisms and improve pain management strategies.

      58) Limitations General - please include the other limitations discussed in this review.

      Done.

      (59)Line 530 - please amend this conclusion, in line with previous comments.

      Done.

      We would like to thank the reviewer for critically evaluating the manuscript and providing insightful comments. We appreciate the reviewer recognising the strengths of our work and believe that their suggestions will contribute to improving the quality of the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:

      The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      We thank the reviewer for their interest in our study.

      Weaknesses:

      While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      We thank the reviewer for their constructive comment. We’d like to highlight that this is the first study to show relationship between subcortical structures asymmetry with attention-modulated alpha oscillation that did not involve any reward-associations- which is the most studied role of basal ganglia. We also believe there is value is having a second study linking the asymmetry in volume of subcortical structures to the modulation of alpha oscillations as this surprising finding also have important clinical implications (see below). We edited the manuscript as below to explain the advances made in this study:

      Introduction (Line 112): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 301): “It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46).”

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

      We thank the reviewer for this constructive comment. We agree with the reviewer that we should compare the beta value amongst the conditions. We therefore determined to better harness the multivariate nature of our analysis. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and found the following which we have added to the manuscript:

      Results (Line 250): “To ascertain whether each predictor contributes to all conditions, we conducted statistical tests on the results of our MMR using the null hypothesis that a given regressor does not impact all dependent variables. We found that while, with marginal significancy, caudate nucleus can predict variability across all four of the task conditions (F(26,4) = 2.82, p-value = 0.046), the predictive relationships of thalamus (F(26,4) = 2.43, p-value = 0.073) with condition 1, and globus pallidus (F(26,4) = 2.29, p-value = 0.087) with conditions 2 and 3 hold only for these conditions. In sum, this demonstrates that when the task is easiest (condition 1), the thalamus is related to alpha modulation. When the task is most difficult (condition 4), the caudate nucleus relates to the alpha modulation, however, its contributions are substantial enough to predict outcomes across all conditions. For the conditions with medium difficulty (conditions 2 and 3) the globus pallidus is related to the alpha band modulation. “

      Method (Line 599): “To examine the specificity of each regressor for lateralized alpha in each condition, we statistically assessed the results of the MMR against the null hypothesis that a particular predictor does not contribute to all dependent variables, employing a MANOVA test in RStudio (version 2022.02.2) (80).”

      Discussion (Line 337): “Thalamus, Globus Pallidus, and Caudate nucleus play varying roles across different load conditions.”

      Discussion (Line 361): “Although these findings highlight the varying contributions of different regions, they do not imply a lack of evidence for correlations between these subcortical structures and other load conditions.”

      Discussion (Line 379): “Additionally, we refrained from directly comparing the contributions of subcortical structures to different conditions due to low statistical power. […] In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments.”

      Reviewer #3 (Public Review):

      Summary:

      In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:

      The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      We appreciate the reviewer’s interest in our study.

      Weaknesses:

      The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:

      (i) a link between volumetric measurements and task activity;

      We thank the reviewer for this constructive feedback. We have now addressed this concern in the revised manuscript.

      Discussion (Line 293): “It has been demonstrated that extensive navigation experience enlarges the size of right hippocampus (40). Furthermore, in terms of neurological disorders, it is well established that shrinkage (atrophy) in specific regions is a predictor of a number of neurological and psychiatric conditions including Parkinson’s disease, dementia, and Huntington’s disease. […] It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increase relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46). “

      (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);

      We thank the reviewer for this helpful comment that resulted in clarification of the manuscript. We addressed this issue in the revised manuscript; we also now have complemented the revised manuscript with papers directly investigating asymmetry of subcortical regions in relation to neurological disorders:

      Introduction (Line 102): “We utilized the hemispheric laterality of subcortical structures and alpha modulation to overcome issues related to individual variations in oscillatory power and head size.”

      Discussion (Line 314): “Employing hemispheric lateralization was motivated by the organizational characteristic of structural asymmetry in healthy brain (47). Additionally, considering the effects of aging (48) and neurodegenerative disorders, such as Alzheimer's Disease (49), on brain symmetry influenced this approach. Furthermore, computing lateralization indices for individuals addresses the challenge of accommodating variations in both head size and the power of oscillatory activity.”

      Discussion (Line 374): “Furthermore, in this study, our emphasis has been on assessing the size of subcortical structures. Future investigations could explore subcortical white matter connectivities and hemispheric asymmetries. This approach has previously been conducted on superior longitudinal fasciculus (SLF) (61,62) and holds potential for examining cortico-subcortical connectivity in the context of oscillatory asymmetries.”

      (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);

      We thank the reviewer for this comment. We assessed the hippocampus and amygdala because they are automatically segmented in the FIRST algorithm. As our analysis showed they did not show a relation to the modulation of alpha oscillations, these regions also provide a useful control for our approach. Therefore, we included all subcortical structures in the model and evaluated their predictive impact. This is now addressed in the revised manuscript.

      Method (Line 477): “FIRST is an automated model-based tool that runs a two-stage affine transformation to MNI152 space, to achieve a robust pre-alignment of thalamus, caudate nucleus, putamen, globus pallidus, hippocampus, amygdala, and nucleus accumbens based on individual’s T1-weighted MR images.”

      Method (Line 576): “The absence of a relationship between modulations of alpha oscillations and the hippocampus and amygdala was expected as these regions typically are not associated with the allocation of spatial attention and thus add validity to our approach. “

      (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);

      We thank the reviewer for this comment that helped us clarify the manuscript. The two-factor design is to investigate how allocation of attentional resources specifically relates to mechanisms of excitability and suppression mechanism. For this reason, both the salience of the distractor (associated with suppression) and the perceptual load of the target (associated with excitability) had to be manipulated. We clarified the rationale in the revised version as below:

      Introduction (Line 96): “We analyzed MEG and structural data from a previous study (27), in which spatial cues guided participants to covertly attend to one stimulus (target) and ignore the other (distractor). To investigate the relationship between the allocation of attentional resources and mechanisms of neural excitability and suppression, the target load and the visual saliency of the distractor were manipulated using a noise mask. This load/salience manipulation resulted in four conditions that affect the attentional demands of target and distractor.”

      (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);

      We thank the reviewer for this clarification comment. We addressed this question in introduction and discussion as below:

      Introduction (Line 107): “By examining their role in a task without explicit reward, we aim to elucidate the generalizability of the contributions of subcortical structures to spatial attention modulation. Such a finding would implicate a role for the basal ganglia in cognition beyond the well-studied realm of the estimation of choice values (33). Specifically, in a prior study (28), we observed that the contributions of the basal ganglia were most pronounced when the items in question were associated with a reward. Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 333): “This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations beyond reward valence and to the context of attention.”

      (vi) effects on rapid frequency tagging.

      We thank the reviewer for this constructive comment. We have now included this analysis and added the results to the revised manuscript.

      Results (Line 224): “It is worth noting that neither the behavioural nor the rapid invisible frequency tagging (RIFT) measures showed significant relationships with LVs and HLM() (Supplementary material, Figure 1 and Table 3).”

      Discussion (Line 396): “We did not find any association between the power of RIFT signal and the size asymmetry of subcortical structures. Since to Bayes factors were less than 0.1, we conclude that our RIFT null findings are robust, suggesting a dissociation between how alpha oscillations and neuronal excitability indexed by RIFT relate to subcortical structures.”

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).”

      Supplementary Materials (Line 839): “Figure 1. Lateralization volume of thalamus, caudate nucleus and globus pallidus in relation to hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) on the right and behavioural asymmetry on the left. A and E, The beta coefficients for the best model (having three regressors) associated with a generalized linear model (GLM) where lateralization volume (LV) values were defined as explanatory variables for HLM(RIFT) (A) and behavioural asymmetry (E). Error bars indicate standard errors of mean (SEM). B and F, Partial regression plot showing the association between LVTh and HLM(RIFT) (B, p-value = 0.59) and behavioural asymmetry (F, p-value = 0.38) while controlling for LVGP and LVCN. C and G, Partial regression plot showing the association between LVGP and HLM(RIFT) (C, p-value = 0.16) and behavioural asymmetry (G, p-value = 0.80) while controlling for LVTh and LVCN . D and H, Partial regression plot showing the association between LVCN and HLM(RIFT) (D, p-value = 0.53) and behavioural asymmetry (H, p-value = 0.74) while controlling for LVTh and LVGP. Negative (or positive) LVs indices denote greater left (or right) volume for a given substructure; similarly negative HLM(RIFT) values indicate stronger modulation of RIFT power in the left compared with the right hemisphere, and vice versa; positive behavioural asymmetry value shows higher accuracy when the target was on the right as compared with left, and vice versa for negative behavioural asymmetry values. The dotted curves in B, C, D, F, G, and H indicate 95% confidence bounds for the regression line fitted on the plot in red.

      Author response image 1.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      We thank the reviewer for this constructive suggestion. In the revised manuscript, we incorporated the model space, model comparisons, BIC values from the models, behavioral and rapid frequency tagging analysis methods, and their respective results. Additionally, we computed Bayes factors for our null findings to enhance the interpretability of our results.

      Results (Line 199): “This model predicted the HLM(α) values significantly in the GLM (F3,29 = 7.4824, p = 0.0007, adjusted R2 = 0.376) as compared with an intercept-only null model (Figure 4A).”

      Although, the beta estimate of LVGP only showed a positive trend, removing it from the regression resulted in worse models (AIC and BIC tables in supplementary material).

      Supplementary materials (Line 827): “Table 1. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for all possible combinations of regressors (Lateralized Volume of subcortical structures). The selected model, with lowest AIC, is marked in green.

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Bayes factors for correlation between hemispheric laterality of subcortical structures with hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) and with behavioural asymmetry (BA). The Pearson correlation between each subcortical structure with HLM(RIFT) and behavioural asymmetry was calculated. The likelihood of the data under the alternative hypothesis (the evidence of correlation) were subsequently compared to the likelihood under null hypothesis (absence of correlation), given the data. As it is demonstrated in the table, all Bayes factors were below or very close to 1 indicating evidence for the null hypothesis.

      For the results of frequency tagging signal, we have now included this analysis and added the results to the revised manuscript. We refer the reviewer to our response to the weakness (vi) from reviewer #3.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The reviewer is correct that we did not directly compare between task conditions when we assessed the predictive relationship between basal ganglia lateralization and alpha lateralization. We opted for the multivariate regression approach as this allowed us to simultaneously model the predictive relationship between our continuous predictors and HLM alpha in each condition, allowing us to be most efficient with our level of statistical power (N=33). Indeed, directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). This approach would be underpowered given our sample size, and the ensuing results are likely to be unreliable.

      However, we statistically analysed our regression results. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and reported the findings in response to weakness two from reviewer #1.

      Discussion (Line 384): “In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments. “

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

      We thank the reviewer for this feedback. These are indeed all good points, and we hope that our findings will inspire further research to address these issues. In the revised manuscript we now write:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained but the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (57).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] . It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      Discussion (Line 388): “Moreover, our failure to identify a relationship between the lateralized volume of subcortical structures and behavioural measures should be addressed in studies that are better designed to capture performance asymmetries (63). Individual preferences toward one hemifield, which were not addressed in the current study design, could potentially strengthen the power to detect correlations between structural variations in the subcortical structures and behavioural measures.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comment:

      Between-subject correlation/regression analyses always rely on the assumption that the underlying dependent measures are reliable. While the reliability of asymmetries of subcortical structures can be assumed, the reliability of lateralized alpha oscillations during spatial attention can be questioned. It would be helpful if the authors could test the reliability of alpha lateralization, for instance by calculating HLM(a) in the first and second half of the experiment and correlating the resulting HLM(a) values (split-half reliability).

      We appreciate the reviewer for their insightful comment. Acknowledging that the between-subject regression relies on the reliability of alpha lateralization. Nonetheless, a previous study has demonstrated consistent results regarding HLM(α). We have further elaborated on these aspects in the discussion section:

      Discussion (Line 328): “Furthermore, our regression analysis outcomes align with the findings of Mazzetti et al. (28) underscoring the significant predictive influence exerted by the lateralized volume of globus pallidus on the modulation of hemispheric lateralization in alpha oscillations during spatial attention tasks. This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations within the context of attention.”

      Reviewer #3 (Recommendations For The Authors):

      We recommend that a revised version of the manuscript

      • Clarifies the theoretical basis for the 6 key design & analysis choices that we have outlined above;

      We thank the reviewer for their precision. We addressed the concerns outlined above in the previous section.

      • Also clarifies the task description (perhaps referring to target and distractor salience instead of target load versus distractor salience might help);

      Thank you for this constructive comment. We used the terms ‘load’ for target and ‘salience’ for distractor because the noise manipulation of the faces reduces the salience of the image which results in distractors being less distractive (easier) but targets being more perceptually loaded (harder). The explanation of these terms is made clear in the revised manuscript.

      Method (Line 447): “Over trials, the perceptual load of targets was manipulated using a noise mask; noisy targets are harder to detect than clear targets and therefore incur greater perceptual load in their detection. The saliency of distractor stimuli was also manipulated using a noise mask; noisy distractor stimuli are less salient than clear distractors and therefore less disruptive to performance on the detection task. The noise mask was created by randomly swapping 50% of the stimulus pixels (Figure 1B). This manipulation resulted in four target-load/distractor-saliency conditions: (1) target: low load, distractor: low saliency (i.e., clear target, noisy distractor), (2) target: high load, distractor: low saliency (i.e., noisy target, noisy distractor), (3) target: low load, distractor: high saliency (i.e., clear target, clear distractor), (4) target: high load, distractor: high saliency (i.e., noisy target, clear distractor) (Figure 1B and C).”

      • Fully reports all the data, including those of the model comparisons, the behavioural results, and the rapid frequency tagging results;

      We thank the reviewer for this constructive comment. We refer the reviewer to our response to second comment and comment (vi) from reviewer #3.

      • Reports interaction effects to directly test the modulating role of task demands in the link between volume and alpha, and break down the alpha lateralization indices into their simple effects on the ipsilateral and contralateral hemispheres;

      task demands have been addressed in response to in response to weakness two from reviewer #1.

      Regarding the second part of the comment, in our study, to compare the lateralized modulation of alpha oscillations between the right and left hemispheres, we computed hemispheric lateralization modulation. This involved dividing trials into attention right and attention left. Subsequently, we calculated the lateralization index separately for sensors on the right and left. Specifically, this entailed computing ipsilateral – contralateral for sensors on the right and contralateral – ipsilateral for sensors on the left side of the brain. We addressed this concern in methods section as below:

      Method (Line 537): “As MI(α) consistently represents power of alpha in attention right versus attention left conditions, it entails the comparison between ipsilateral and contralateral alpha modulation power for sensors located on the right side of the head. The same comparison applies inversely for sensors situated on the left side of the brain.”

      • Clarifies in the discussion section the specific implications of the results for our understanding of the link between distinct subcortical structures and distinct component processes of spatial attention.

      We thank the reviewer for their constructive comment. This point is addressed in response to the fourth concern of reviewer #3.

      More detailed specific recommendations are provided below:

      • Line 40ff: In this paragraph, the theoretical framework concerning the function of the subcortical regions of interest is described. Here, the authors jump back and forth between the role of the basal ganglia and the role of the thalamus. For clarity, we would advise to describe the functions of these two structures one after the other. And include a justification for assessing the hippocampus and the amygdala.

      We appreciate the reviewer’s preciseness in this comment. We put the description of these structures one after the other in the revised manuscript as below:

      Introduction (Line 44): “For instance, it has been shown that the pulvinar plays an important role in the modulation of neocortical alpha oscillations associated with the allocation of attention (9). Studies in rats and non-human primates have shown that both the thalamus and superior colliculus, are involved in the control of spatial attention by contributing to the regulation of neocortical activity (9-11). Notably, when the largest nucleus of the thalamus, the pulvinar, was inactivated after muscimol infusion, the monkey’s ability to detect colour changes in attended stimuli was lowered. This behavioral deficit occurred when the target was in the receptive field of V4 neurons that were connected to lesioned pulvinar (12). The basal ganglia play a role in different aspects of cognitive control, encompassing attention (13,14), behavioural output (15), and conscious perception (16). Moreover, the basal ganglia contribute to visuospatial attention by linking with cortical regions like the prefrontal cortex via the thalamus.”

      Justification for assessing the hippocampus and the amygdala has been addressed in response to weakness (iii) from reviewer #3.

      • The authors mention they defined symmetric clusters of 5 sensors in each hemisphere that showed the highest modulation, but it is not clear how this number of sensors was determined a priori.

      We thank the reviewer for their comment. We edited the revised manuscript as below:

      Method (Line 536): “Ten sensors were selected to ensure sufficient coverage of the region exhibiting alpha modulation as judged from prior work (62).”

      • In line 141, the abbreviation HLM is first mentioned but the concept of "hemispheric lateralization modulation of alpha power" is only mentioned in the following section. For the ease of the reader, the abbreviation could be mentioned together with this concept at the beginning of this paragraph.

      We thank the reviewer for the attention. In the revised manuscript HLM() is now mentioned with its concept.

      Results (Line 153): “Next, we computed the hemispheric lateralization modulation of alpha power (HLM()) in each individual.”

      • In line 188 of the results section, it is mentioned that the table including the AIC values for model comparisons is in the supplementary material, however, we could not locate this table.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the supplementary materials to the end of the manuscript for convenience.

      • Figure 4 is missing the panel headers A, B, C, and D.

      We thank the reviewer for their precision. This figure is now fixed.

      Author response image 2.

      • In lines 205 and 206, behavioral and rapid frequency tagging analysis are mentioned. For the behavioral analysis, the method is described, but no results are provided. For the rapid frequency tagging, neither the methods nor the results are described. To evaluate the strength of this (non)-evidence, we would advise to elaborate on these analysis steps and report the results in the supplementary material.

      We thank the reviewer for this constructive comment. A brief explanation of the analysis method of rapid frequency tagging signal is added to the revised manuscript.

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).” For a more detailed answer, we refer the reviewer to the second comment from reviewer #3.

      • For the paragraph starting at line 209, we would recommend referring to Figure 1.

      We thank the reviewer for their suggestion. This paragraph is now referring to Figure 1.

      Results (Line 229): “To relate load and salience conditions of the task to the relationship between subcortical structures and the alpha activity, we combined low-load or high-load targets with high-saliency or low-saliency distractors to manipulate the perceptual load appointed to each trial (Method section, Figure 1). “

      • Figure 5 as well as the report of the beta weights in this section shows a difference in the direction of the effect for the thalamus compared to the globus pallidus and caudate nucleus which is not discussed in this section.

      We thank the reviewer for bringing this important point to our attention. We addressed this comment in the discussion section as below:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained by the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (54).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      • Comment 2 on line 80 is addressed in the paragraph following 264 by describing volumetric changes in basal ganglia in neurodegenerative disorders such as PD or Huntington's. Still, the link of how a decrease in volume in this region could be causally linked to changes in alpha-band power could be better supported.

      We thank the reviewer for their constructive feedback. We are here highlighting the significant correlation between subcortical structures and changes in attention modulated alpha oscillation. We added a few more references to the discussion supporting the relationship between size and function in relation to neurological disorders. We also edited the manuscript to make this point clearer as below:

      Introduction (Line 113): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, independent of any reward or value associations. “

      Discussion (Line 305): “Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (42). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (43). “

      • Related to the previous comment on behavioral and rapid frequency tagging results, these are difficult to evaluate without mention of the methods and/or results.

      We thank the reviewer for this comment. We refer the reviewer to our response to the second comment from reviewer #3.

      • The authors show differential effects of target load and distractor saliency; however, we missed the description of how these two variables differ conceptually as they are both described as contributing to task difficulty and it is not described why we would expect differential effects for these concepts (or in other words, how the authors explain the differential effects).

      We thank the reviewer for their comment. Directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). Give our sample size, this study is underpowered to directly compare alpha lateralisation in contralateral versus ipsilateral conditions. For a more detailed answer please refer to our response to weakness two from reviewer #1.

      • Line 364ff: Based on the description of the experimental design, it is not clear to us whether participants only had to report on the change in gaze for the stimulus in the cued hemifield.

      We thank the reviewer for this comment, which prompted us to clarify the experimental design as below:

      Method (Line 440): “Then followed a 1000 ms response interval where participants were asked to respond with their right or left index finger whether the gaze direction of the cued face shifted left or right.”

      • Line 47ff: As mentioned above, the AIC table is not included. Further, as it is mentioned that BIC values led to similar results (indicating that they are not identical), it would be valuable to report both AIC and BIC values.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the BIC values and attached the supplementary materials to the end of the manuscript for convenience.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This article by Zhai et al, investigates sterol transport in bacteria. Synthesis of sterols is rare in bacteria but occurs in some, such as M capsulatus where the sterols are found primarily in the outer membrane. In a previous paper the authors discovered an operon consisting of five genes, with two of these genes encoding demethylases involved in sterol demethylation. In this manuscript, the authors set out to investigate the functions of the other three genes in the operon. Interestingly, through a bioinformatic analysis, they show that they are an inner membrane transporter of the RND family, a periplasmic binding protein, and an outer membrane-associated protein, all potentially involved with lipid transport, so providing a means of transporting the lipids to the outer membrane. These proteins are then extensively investigated through lipid pulldowns, binding analysis on all three, and X-ray crystallography and docking of the latter two.

      Strengths

      The lipid pulldowns and associated MST binding analysis are convincing, clearly showing that sterols are able to bind to these proteins. The structures of BstB and BstC are high resolution with excellent maps that allow docking studies to be carried out. These structures are distinct from sterol-binding proteins in eukaryotes.

      We thank the reviewer for their favorable impression of this work.

      Weaknesses

      While the docking and molecular dynamics studies are consistent with the binding of sterols to BstB and BstC, this is not backed up particularly well. The MST results of mutants in the binding pocket of BstB have relatively little effect, and while I agree with the authors this may be because of the extensive hydrophobic interactions that the ligand makes with the protein, it is difficult to make any firm conclusions about binding.

      We agree with the reviewer that at this point, there is no experimental evidence to define the sterol binding site in BstB. While in the manuscript we allude to the extensive hydrophobic interactions as being especially stabilizing and difficult to eliminate with one or two mutations, we are now also aware that hydrogen-bonding interactions with the polar head of the sterols are quite important (see data on BstC, where disruption of that interaction significantly reduces the equilibrium affinity for sterols). Our MD simulations show that at least 3 protein amino acids can participate in H-bonding with the sterols. Moreover, recent work from our lab show that even ligand site waters can extend an H-bonding network around the polar head of the lipid (Zhai et al., ChemBioChem 2023, 24, e202300156), thereby enabling H-bonding with amino acids that are further away from the ligand site. It is therefore difficult to predict which mutations will sufficiently destabilize the binding. While this question is one we will tackle in future studies focused on obtaining high-resolution substrate-bound structures of BstB or homologs, the findings reported here are still relevant and timely, and we posit will spur the discovery of functional homologs, including some in organisms that are more tractable.

      The authors also discuss the possibility of a secondary binding site in BstB based on a slight cavity in domain B next to a flexible loop. This is not backed up in any way and seems unlikely.

      The reviewer is correct in that the evidence for this second binding site weak. While the crystallographic structure shows a highly hydrophobic region and the binding studies suggests cooperativity exists in the binding of the 4methylsterol substrate, the docking studies do not strongly support binding at that site. As such, we have clarified in the manuscript that a second hydrophobic cavity is observed, but that its role in ligand interaction remains unexplored.

      Reviewer #2 (Public Review):

      Summary:

      In eukaryotes, sterols are crucial for signaling and regulating membrane fluidity, however, the mechanism governing cholesterol production and transport across the cell membrane in bacteria remains enigmatic. The manuscript by Zhai et al. sheds light on this topic by uncovering three potential cholesterol transport proteins. Through comprehensive bioinformatics analysis, the authors identified three genes bstA, bstB, and bstC encoding proteins which share homology with transporters, periplasmic binding proteins, and periplasmic components superfamily, respectively. Furthermore, the authors confirmed the specific interaction between these three proteins and C-4 methylated sterols and determined the structures of BstB and BstC. Combining these structural insights with molecular dynamics simulation, they postulated several plausible substrate binding sites within each protein.

      Strengths:

      The authors have identified 3 proteins that seem likely to be involved in sterol transport between the inner and outer membrane. The structures are of high quality, and the sterol binding experiments support a role for these proteins in sterol transport.

      We thank the reviewer for this positive view of our work.

      Weaknesses:

      While the author's model is very plausible, direct evidence for a role of BstABC in transport, or that the 3 proteins function together in a single pathway, is limited.

      The reviewer is correct that we were unable to demonstrate that the three proteins work together to transport 4methylsterols. This is not for lack of trying. We first attempted gene deletion studies, and as mentioned in the manuscript (with more details now provided in the experimental section), this appeared to be lethal. We then attempted in vitro exchange experiments, in which the proteins would be used to transfer sterols from sterol-loaded “heavy” liposomes to a sterol-free “light” liposomes – such exchange assays are frequently performed with eukaryotic sterol transporters (see Chung et al., Science 2015, https://doi.org/10.1126/science.aab1370). These assays were not successful because 1) sterols incorporated poorly into liposomes made with E. coli polar lipids and yielded leaky liposomes; 2) use of liposomes prepared with the TLE of M. capsulatus proved more stable, but no appreciable exchange was observed; we reasoned that this might be due to the absence of an energy source for BstA, the RND component for which we have expressed and purified only the soluble periplasmic domain. Given the technical difficulty of these in vitro transport experiments, we will continue to pursue in vivo demonstration of function as new homologs are identified.

      Reviewer #3 (Public Review):

      Summary:

      The work in this manuscript builds on prior efforts by this team to understand how sterols are biosynthesized and utilized in bacteria. The study reports a new function for three genes encoded near sterol biosynthesis enzymes, suggesting the resulting proteins function as a sterol transport system. Biochemical and structural characterization of the two soluble components of the pathway establishes that both proteins can bind sterols, with a preference for 4methylated derivatives. High-resolution x-ray structures of the apoproteins reveal hydrophobic cavities of the appropriate size to accommodate these substrates. Docking and molecular dynamics simulations confirm this observation and provide specific insights into residues involved in substrate binding.

      Strengths:

      The manuscript is comprehensive and well-written. The annotation of a new function in a set of proteins related to bacterial sterol usage is exciting and likely to enable further study of this phenomenon - which is currently not well understood. The work also has implications for improving our understanding of lipid usage in general among bacterial organisms.

      We thank the reviewer for this synopsis of our work.

      Weaknesses:

      The authors might consider moving some of the bioinformatics figures to the main text, given how much space is devoted to this topic in the results section.

      We have taken this advice and moved Figure S1 to the main manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. In the analysis of the MST data, the authors quote Hill coefficients. How reliable are these numbers? For BstB, for instance, it seems unlikely that more than one molecule would bind. Can the analysis be done without needing to include Hill coefficients?

      We used fits that did and did not invoke cooperativity – see below. We are certain that both BstA and BstB are better fit with cooperativity invoked.

      Author response image 1.

      1. In looking at the maps associated with the structures, which were included in the review package, I see that two citric acid molecules fit beautifully into the density where currently PEG has been modelled. This needs to be fixed and some comments may be appropriate in the manuscript.

      We thank the reviewer for calling our attention to this. Citric acid has now been added to the model, and we reason that these are present in the structure because citric acid was used in the crystallization condition. The revised model is now present in the PDB.

      1. It is not necessary to show the two molecules in the asymmetric unit in Figure 4 given that it is not a dimer. This doesn't add anything to the manuscript.

      We now show a single molecule of BstC in Figure 4 (now Figure 5).

      1. I wouldn't consider the loops shown in Figure S4 as disordered. They have slightly higher B-values but are not completely mobile.

      We did not refer to these loops as disordered. In the text, we say they “exhibit poor electron densities, suggesting conformational sampling of more than one state (Fig. S4A).”

      Reviewer #2 (Recommendations For The Authors):

      pg 7, "hinting at an astounding distinction": I might suggest a word other than astounding that conveys how statistically unlikely, unusual, etc. this result is.

      Thank you – we have removed “astounding”.

      pg 7, paragraph 2: Here the authors show that in the SSN analysis, BstB proteins cluster separately and suggest this implies a distinction in function. However, they also show that PhnD homologs do not cluster separately (distributed across multiple clusters), yet presumably have similar functions. I am not familiar with SSN, but it seems to me that the second statement about PhnD implies that the first statement about BstB might not be valid, i.e., if PhnD doesn't cluster based on function, on what basis can we conclude that BstB does? On what basis does clustering occur in the SSN analysis? Might it be driven by things other than function? This comment also concerns the final paragraph of this section.

      The reviewer is correct in that PhnD homologs occupy separate clusters of the SSN. Many of these homologs were crystallized with phosphate-like compounds, but it is possible that they have non-overlapping substrate scopes and are therefore functionally distinct. As for the basis of clustering, the SSN is fully sequence-based. What has been observed is that proteins with highly similar sequences can have similar functions – but this is not always true.

      pg 8, paragraph 1: The authors suggest that BstABC may be essential. This is probably not a critical claim and it might be simplest to just remove it, but if it is mentioned, the authors should probably explain what was attempted that failed, so a reader can assess the strength of the evidence supporting essentiality. For example, I don't see anything in the methods about genetic manipulations of M. capsulatus, so currently, this falls within the realm of "Data not shown".

      We have provided additional information about the experimental techniques used to do this. This statement was included so that it is understood that the reason for the experimental failure is unlikely to be technical in nature, as we have successfully deleted some sterol related genes while others remain intractable.

      Fig. 2A: It is unclear to me what is being plotted here, perhaps more experimental detail is required in the form of labels and/or legend. Is this a quantification of each sterol in each fraction separated by GC? There are essentially no methods provided for the GC-MS experiments. A reference is provided, but I think providing detailed methods for these specific experiments will provide a higher degree of scientific rigor. I am not sure what is standard for GCMS, but perhaps showing spectra in the supplement that establish the identity of the bound molecules as species I and II would be appropriate?

      Additional experimental details have been provided and the figure legend changed to be more clear. Moreover, we now clearly state that the chromatograms shown were used to identify lipids due to retention times for spectra that were previously published in Wei et al., 2016.

      pg 10-11, comparison with PhnD structure: Perhaps it is worth mentioning a 3rd possible explanation for the relative opening/closing of the cleft is simply crystal packing? I don't think it necessarily has to imply anything about a difference in function. Also, the focus seems to be on this pairwise comparison, but perhaps more insights could be gleaned from an analysis that included a wider range of homologs, especially if any are thought to bind hydrophobic substrates.

      This could be true, and we have included a statement to that effect. We are unaware of homologs shown to bind to large, hydrophobic molecules.

      I think that BstB is shown upside-down in sup movies relative to other figures. If it isn't changed, perhaps adding some labels would help orient the reader.

      We have rotated the movies to be more consistent with the figures.

      Fig. S7: No units are indicated for Kds (uM?).

      Thank you – this has been fixed.

      pg 11, paragraph 2. "adjacent to three residues: Glu118, Tyr120 and Asn192": The residue number used in the text doesn't seem to match the numbering in the PDB file. I think these residues correspond to Glu98, Tyr100, and Asn172 in the PDB file.

      We regret this error. The correct numbering for both structures is now present in the deposited PDB files (7T1M for BstB and 7T1S for BstC).

      pg 12, final paragraph: The authors present binding data for BstB variants with mutations in the putative sterol binding pocket identified in the structural and MD analyses. However, these mutants had no effect on binding. The authors rationalize this in terms of the size of the interface and hydrophobic nature (which indeed, may be correct and is very plausible), and it is worth noting that many of their mutations are to Ala and would largely preserve the hydrophobic nature of the cleft. However, these mutants raise questions about where sterols actually bind. No experimental evidence is presented that substrates bind in the cleft, it is only hypothesized based on structural homology, MD simulations, etc. These mutations formally provide evidence against the hypothesis being tested; I think that has to be discussed a bit more directly, alongside the caveats the authors already discuss about hydrophobicity, etc.

      This is a valid point by the reviewer, and it is one we have attempted to address with our statement in the manuscript and in our response to reviewer 1. We have modified the relevant text to more clearly state that there is as of yet no experimental evidence for the binding of sterols to the cavity identified via molecular docking.

      pg 13: Presumably this is not the full-length lipoprotein, but has been truncated/mutated in some way? Some statement of roughly what was purified/crystallized should be stated.

      The SI methods on protein purification states that the genes of BstB and BstC without their respective signal peptides were obtained.

      pg 13, last paragraph "TN1 exhibits hybrid hydrophobicity, with the sides horizontal to cavities being hydrophobic while the vertical sides are more hydrophilic". I don't really follow the horizontal vs vertical sides. Perhaps this could be described in a different way.

      Noted and changed to “TN1 is closer to the N-terminal face of the structure, while CA1 and CA2 are proximal to the C-terminal face and form two open hydrophobic pockets; TN1 exhibits a mixture of hydrophobic and hydrophilic amino acids (Fig. 4B and Fig. S9B, Table S4).”

      pg 15-16, "Comparison to eukaryotic sterol transporters": Perhaps this would be better suited for the discussion section? Could also be streamlined; it is mostly discussing and comparing eukaryotic sterol binding domains to each other, not to BstABC.

      Given that BstB and BstC are the first identified proteins (and putative transporters) for bacterial sterol engagement, we thought a careful description of the existing sterol transporters (which are all eukaryotic) was warranted.

      Reviewer #3 (Recommendations For The Authors):

      I have just two minor suggestions for the authors if they wish to comment on or address them.

      1. Do the three proteins (BstA/B/C) form any sort of complex? Perhaps this property was not assessed - but it seemed possible that the B and C components might constitute a shuttle for the membrane-bound transporter?

      This is an important observation – the unliganded version of these proteins show no appreciable affinity for each other. However, BstB (which would be expected to engage both with BstA and BstC) belongs to a family of proteins known to undergo significant conformational change upon substrate binding. It is possible that with substrate present, complexes are formed – we have yet to investigate this.

      1. In Figure S1, panel C - it appears that the label for the BstC cluster may have migrated away from the intended location. In this figure, it might also be useful to indicate in the caption the meaning of the red coloring of the nodes?

      The label is now fixed – thank you for drawing our attention to this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1

      Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA are unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      • A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.

      • The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.

      • The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      • A weakness may include the rather limited number of samples. Especially for some sub-analyses (e.g. RNA analyses), only a subset of samples was used.

      • How was the sample size determined? It seems like more samples might have been necessary to obtain significant results for methods with a higher standard deviation (e.g. histomorphometry).

      We apology for the oversight in the description of the statistical analysis and we thank the reviewer for the careful reading. For sample size calculation of bone histomorphometry we used the cohort of the only paper analyzing trabecular bone in T2D postmenopausal women by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test, difference between two independent groups setting. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978. Regarding gene expression analyses, it was performed not in a subset of patients, but in all recruited subjects for this study. Based on the results of gene expression analysis on our main outcome (Wnt signaling), we demonstrated that for SOST gene the effect size was 1.2733824, with a power of 0.9490065, confirming that sample size was sufficient to achieve adequate statistical power.

      • Why is the number of samples different for the mRNA measurements? In most cases, there were 9, but in some 8 and in some 10?

      We sincerely thank the reviewer for the opportunity to clarify such important aspects. The number of samples used for mRNA quantification may differ between the different analyzed genes due to multiple reasons: First, we used for the real-time PCR only samples with high quality ratio (260/280) between 1.8-2.0 as stated in the method section of the manuscript (Page 8, lines 163-164). Moreover, we decided not to use the undetermined values, undetectable after the amplification cycles (40 cycles in total), as specified in the method section (Page 8, line 167).

      Overall, this study validates findings from the group that reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

      We thank the reviewer for the positive comment, we really value her/his opinion.

      COMMENTS:

      (1) The authors could provide more details on how much of the bone was analyzed for bone histomorphometry (what area?).

      We truly thank the reviewer for allowing us to explain more in depth our methodology. First, a biopsy containing trabecular bone from the femoral head was fixed in 10% neutral buffered formalin for 24 h prior to storage in 70% ethanol. Tissues were embedded in methylmethacrylate and sectioned sagittally by the Washington University Musculoskeletal Histology and Morphometry Core. Sections were stained with Goldner’s trichrome. Then, a rectangular region of interest containing trabecular bone was chosen below the cartilage-lined joint surface and primary spongiosa. This region had an average dimension of 45 mm2. Tissue processing artifacts, such as folding and edges, were excluded from the ROI. A threshold was chosen using the BIOQUANT software to automatically select trabeculae and measure bone volume. Finally, Osteoid was highlighted in the software and quantified semi-automatically using a threshold and correcting with the brush tool (as shown in the image below).

      We specify that in the methods section (Page 7, lines 146-152).

      Author response image 1.

      (2) Could the number of samples used for histomorphometry be increased? That may also lead to more significant results.

      We sincerely appreciated this suggestion from the reviewer but unfortunately, all available samples for histomorphometry have been analyzed and we are not able to increase the number of recruited participants at this time. Recruitment of people with T2D undergoing hip replacement is extremely difficult giving the limited number of those approved for elective surgery and compliant with our inclusion criteria. Considering also the long time needed to process bone sample for gene expression and histology analysis would require several months to have a consistent increase in recruited subjects. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      (3) It would have been interesting to assess the biomechanical behavior of the bone specimens. While it is known that BMD is often higher in patients with T2D, the resistance to fractures is lower. Ideally, bone strength measures could be correlated with Wnt molecule expression and AGEs.

      We agree with the reviewer that the assessment of biomechanical parameters in our cohort would increase the importance of this study, giving more insights on the effect of downregulation of Wnt signaling on bone strength. Thus, we followed reviewer suggestion, and we performed bone compression tests on trabecular bone core. We found a significant decrease in bone plasticity of T2D compared to controls [Young’s Modulus 21.6 (13.46-30.10 MPa) vs. 76.24 (26.81-132.9 MPa); p=0.0025). We added results of bone compression test in a new paragraph (Page 8, lines 191-194). In order to assess the validity of our results, we performed a post-hoc power calculation using G*Power 3.1.9.7. We demonstrated that effect size was 1.4716626, with a power of 0.9730784, confirming that sample size was sufficient to achieve adequate statistical power. We added methods in the related section and biomechanical data in table 3; we modified the manuscript accordingly (modifications are shown in track changes). Moreover, we also performed correlation analysis between Wnt target genes, AGEs and biomechanical parameters showing significant correlations as reported in the added paragraph in the results section (Page 11, Lines 225-233).

      REVIEWER #2

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They found higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bone samples, and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

      We thank the reviewer for the comment. Replying to his/her concerns we have increased the number of Wnt target genes including more interactors of Wnt/β-catenin pathway. We measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figure 1 panel (Page 10, lines 210-213). Unfortunately, in this paper we were not able to perform experiments on cellular or physiological properties. However, in order to analyze the biological effect of the analyzed genes on the phenotype, we measured bone strength by performing compression tests on trabecular bone cores (Page 10, lines 201-203 and table 3) and used biomechanical parameters for correlation analysis with targeted genes showing significant correlations of bone strength and Wnt genes. We modified adding a new paragraph in the result section and a new figure panel to the main manuscript (Page 11, lines 225-233 and figure 4).

      COMMENTS:

      (1) The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. Given the author's success in obtaining good-quality RNA from trabecular bone, a more comprehensive exploration would greatly improve the quality of the study.

      We agree with the reviewer that increase the transcriptional landscape related to Wnt signaling would be of interest for this work and we really thank for this opportunity. We were able to increase the number of Wnt target genes including more interactors of Wnt/β-catenin pathway, using the same cohort of patients in which we performed the other analysis. We also measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figures panel (Page 10, lines 210-213 and Figure 1).

      (2) The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations. Can the authors perform immunohistochemistry to associate the changes in gene expression with protein expression?

      We sincerely acknowledge this comment for focusing the attention on a such important aspect. We have partially replied to this comment in the previous paragraph. Regarding immunohistochemistry analysis, it is not possible to further use the available samples. This is mainly due to the fact that non-decalcified bones were embedded in plastic to allow for separate analysis of newly formed osteoid and mineralized bone. This process leads to poor antigen preservation and unsuitable detection of most targets. Moreover, antibodies for Wnt are also unreliable due to the secreted nature of the protein. Overall, this approach is unlikely to work efficiently. Similarly, RNAscope is not possible due to the resin. Optimization and validation of these analyses will need to be saved for a future study with fresh specimens.

      REVIEWER #3

      The manuscript by Leanza and colleagues explores the regulation of Wnt signaling and its association with advanced glycation end products (AGEs) accumulation in postmenopausal women with type 2 diabetes (T2D). The paper provides valuable insights into the potential mechanisms underlying bone fragility in individuals with T2D. Overall, the manuscript is well-structured, and the methodology is sound. I would suggest some minor revisions to improve clarity.

      Strengths:

      The study addresses an important and clinically relevant question concerning the mechanisms underlying bone fragility in postmenopausal women with T2D.

      The study's methodology appears sound, and the inclusion of postmenopausal women with and without T2D undergoing hip arthroplasty adds to the clinical relevance of the findings. Additionally, measuring gene expression and AGEs in bone samples provides direct insights into the study's objectives.

      The manuscript presents data clearly, and the results are well-organized.

      Weaknesses:

      Title. The title could be more specific to better reflect the content of the study. Also, the abstract should concisely summarize the study's main findings, providing some figures.

      We thank the reviewer for this suggestion, and we modified the title giving specific information on the main findings of this study. The new title is “Bone canonical Wnt signaling is downregulated in type 2 diabetes and associates with higher Advanced Glycation End-products (AGEs) content and reduced bone strength”. Moreover, we added as suggested a graphical abstract summarizing our study results.

      Introduction: the introduction would benefit from the addition of a clearer, more focused statement of the research questions or hypotheses guiding this study.

      We thank the reviewer for this opportunity and we reformulated the hypothesis of this study based on our data and new findings as follow:” we hypothesized that T2D and AGEs accumulation downregulate Wnt canonical signaling and negatively affect bone strength”. (page 6, lines 116-117).

      Methods: more information is needed on the hystomorphometry analysis. Surgical samples from 8 T2D and 9 non-diabetic subjects were used for histomorphometry analysis. How did these subjects compare with the other subjects in the T2D and control groups? Were they representative? How were they selected?

      We thank the reviewer for the opportunity to clarify this important point. The number of subjects included in the different analysis of the paper differ for multiple reasons. In particular, we used only bone specimen with enough trabecular bone material adequate to perform histomorphometry analysis. Therefore, the samples used in the histomorphometry analysis belong to the same subjects enrolled in the study and analyzed for the other experiments of this paper. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      COMMENTS:

      (1) In the Abstract, values and p-values for comparisons, and Spearman's rho and p-values for correlations should be provided. Most adverbs (thus, accordingly, importantly) could be omitted to improve conciseness and clarity.

      We kindly thank the reviewers for this precise and careful comment. We changed the Abstract accordingly. According to the abstract style of the journal we initially reported only the main findings. We have now modified providing values and p values as requested. We defer to the wishes of the editor as to the format in which the abstract should be reported.

      (2) Result presentation: 25th and 75th percentile should be provided rather than the interquartile range, to better reflect data distribution.

      We thank the reviewer for the opportunity to better clarify this part of the results section. We changed the manuscript accordingly.

      (3) Estimated glomerular filtration rate should be calculated and provided as a marker of renal function, rather than serum creatinine values.

      We thank the reviewer for the comment, and we modify the manuscript accordingly, adding the eGFR values in table 1 and in the result section.

      (4) The manuscript should include a statement confirming compliance with the Declaration of Helsinki, considering that human subjects were involved in the study.

      We thank the reviewer for the comment. The study was conducted in accordance with the Declaration of Helsinki. Ethics Committee of Campus Bio-Medico University approved the present study. Informed consent was obtained from all subjects involved in the study. (Page 6, lines 134-137).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech. 

      Strengths: 

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications. 

      Weaknesses: 

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results. 

      For the Experimental Procedure, we will provide a more detailed description about stimuli, and the comprehension test, and upload the audio files and corresponding transcriptions as the supplementary dataset. 

      For statistics/analyses, we have reproduced the states' spatial maps using unnormalized activity pattern. For the resting state, we observed a state resembling the baseline state described in Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states were characterized by network activities varying largely from zero. In addition, we have re-generated the null distribution for behaviorbrain state correlations using circular shift. The results are largely consistent with the previous findings. We have also made some other adjustment to the analyses or add some new analyses as recommended by the reviewer. We will revise the manuscript to incorporate these changes.

      For the interpretation/rationale: We will add a more detailed interpretation for the association between state occurrence and semantic coherence. Briefly speaking, higher semantic coherence may allow for the brain to better accumulate information over time.

      State #2 seems to be involved in the integration of information at shorter timescales (hundreds of milliseconds) while State #3 seems to be involved in the longer timescales (seconds). 

      We greatly appreciate the reviewer for the insightful comments and constructive suggestions.  

      Reviewer #2 (Public review): 

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension. 

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions. 

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      In the revision, we generated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.

      We note that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations are primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will replace Figure 3 and the related supplementary figures with new ones, in which the null distribution is generated via circular shift. Furthermore, we will expand our discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion."

      In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on the ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015).

      Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes. 

      The temporal window hypothesis provides a more fitting explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) The "Participants and experimental procedure" section deserves more details. I've checked Liu et al. (2020), and the dataset contained 43 participants aged 20-75 years, whereas this study contained data from 64 young adults and 30 old adult samples. The previous dataset seems to have two stories, whereas this study seems to have three. Please be specific, given that the dataset does not seem the same. Could the authors also include more descriptions of what the auditory stories were? For example, what were the contents, and how were they recorded? 

      The citation is partially incorrect. The dataset of young adults is shared with our work published in (2022). The 64 participants listened to one of three stories told by a female college student in Mandarin, recounting her real-life experience of hiking, a graduate admission interview, and her first time taking a flight, respectively. The sample of older adults is from our work published in (2020), which includes 30 older adults and additionally 13 young adults. The stimuli in this case were two stories told by an older woman in a Chinese dialect, describing her experience in Thailand and riding a warship, respectively. Since we aim to explore whether the main results can be replicated on a different age group, we excluded the 13 young adults from the analysis. 

      All the stories were recorded during fMRI scanning using a noise-canceling microphone (FOMRI-III; Optoacoustics Ltd, Or-Yehuda, Israel) positioned above the speaker’s mouth. The audio recordings were subsequently processed offline with Adobe Audition 3.0 (Adobe Systems Inc., USA) to further eliminate MRI scanner noise.

      In the revised manuscript, we have updated the citation, and provided a more detailed description of the stimuli in the supplementary material. We have also uploaded the audio files along with their corresponding transcriptions to GitHub.

      (2) I am curious about individual differences in comprehension scores. Did participants have less comprehension of the audio-narrated story because the story was a hard-tocomprehend narrative or because the audio quality was low? Could the authors share examples of comprehension tests? 

      We believe two factors contribute to the individual differences in comprehension scores. First, the audio quality is indeed moderately lower than in dailylife story-listening conditions. This is because those stories were recorded and played during fMRI scanning. Although a noise-canceling equipment was used, there were still some noises accompanying the speech, which may have made speech perception and comprehension more difficult than usual.

      Second, the comprehension test measured how much information about the story (including both main themes and details) participants could recall. Specifically, participants were asked to retell the stories in detail immediately after the scanning session. Following this free recall, the experimenters posed a few additional questions drawn from a pre-prepared list, targeting information not mentioned in their recall. If participants experienced lapses of attention or did not store the incoming information into memory promptly, they might fail to recall the relevant content. In several studies, such a task has been called a narrative recall test. However, memory plays a crucial role in real-time speech comprehension, while comprehension affects the depth of processing during memory encoding, thereby influencing subsequent recall performance. To align with prior work (e.g., Stephens et al., 2010) and our previous publications, we chose to referred to this task as narrative comprehension. 

      In the revised manuscript, we have provided a detailed description about the comprehension test (Line 907-933) and share the examples on GitHub. 

      (3) Regarding Figure 3, what does it mean for a state occurrence to follow semantic coherence? Is there a theoretical reason why semantic coherence was measured and related to brain state dynamics? A related empirical question is: is it more likely for the brain states to transition from one state to another when nearby time points share low semantic similarity compared to chance? 

      We analyzed semantic coherence and sound envelope as they capture different layers of linguistic and acoustic structure that unfold over varying temporal scales. Changes in the sound envelope typically occur on the order of milliseconds to a few hundred milliseconds, changes in word-level semantic coherence span approximately 0.24 ± 0.15 seconds, and changes in clause-level semantic coherence extend to 3.2 ± 1.7 seconds. Previous theory and empirical studies suggest that the timescales of information accumulation vary hierarchically, progressing from early sensory areas to higher-order areas (Hasson et al., 2015; Lerner et al., 2011). Based on this work, we anticipate that the three brain states, which are respectively associated with the auditory and sensory motor network, the language network and the DMN, would be selectively modulated by these speech properties corresponding to distinct timescales. 

      Accordingly, when a state occurrence aligns with (clause-level) semantic coherence, it suggests that this state is engaged in processing information accumulated at the clause level (i.e., its semantic relationship). Higher coherence facilitates better accumulation, making it more likely for the associated brain state to be activated. 

      We analyzed the relationship between state transition probability and semantic coherence, but did not find significant results. Here, the transition probability was calculated as Gamma(t) – Gamma(t-1), where Gamma refers to the state occurrence probability. The lack of significant findings may be because brain state transitions are driven primarily by more slowly changing factors. Indeed, we found the average dwell time of the three states ranges from 9.66 to 15.29s, which is a much slower temporal dynamics compared to the relatively rapid shifts in acoustic/semantic properties. 

      In the revised version, we have updated the Introduction to clarify the rational for selecting the three speech properties and to explore their relationship with brain dynamics (Line 111-118)

      (4) When running the HMM, the authors iterated K of 2 to 10 and K = 4, 10, and 12. However, the input features of the model consist of only 9 functional networks. Given that the HMM is designed to find low-dimensional latent state sequences, the choice of the number of latent states being higher than the number of input features sounds odd to me - to my speculation, it is bound to generate almost the exact same states as 9 networks and/or duplicates of the same state. I suggest limiting the K iterations from 2 to 8. For replication with Yeo et al.'s 7 networks, K iteration should also be limited to K of less than 7, or optionally, Yeo's 7 network scheme could be replaced with a 17network scheme. 

      We understand your concern. However, the determination of the number (K) of hidden states is not directly related to the number of features (in this case, the number of networks), but rather depends on the complexity of the time series and the number of underlying patterns. Given that each state corresponds to a distinct combination of the features, even a small number of features can be used to model a system with complex temporal behaviors and multiple states. For instance, for a system with n features, assuming each is a binary variable (0 or 1), there are maximally 2<sup>n</sup> possible underlying states. 

      In our study, we recorded brain activity over 300 time points and used the 9 networks as features. At different time points, the brain can exhibit distinct spatial configurations, reflected in the relative activity levels of the nine networks and their interactions. To accurately capture the temporal dynamics of brain activity, it is essential to explore models that allow for more states than the number of features. We note that in other HMM studies, researchers have also explored states more than the number of networks to find the best number of hidden states (e.g., Ahrends et al., 2022; Stevner et al., 2019). 

      Furthermore, Ahrends et al. (2022) suggested that “Based on the HCP-dataset, we estimate as a rule of thumb that the ratio of observations to free parameters per state should not be inferior to 200”, where free parameters per state is [𝐾 ∗(𝐾 −1)+ (𝐾 −1)+𝐾 ∗𝑁 ∗(𝑁 +1)/2]/𝐾. According to this, there should be above 10, 980 observations when the number of states (K) is 10 (the maximal number in our study) and the number of networks (N) is 9. In our group-level HMM model, there were 64 (valid runs) * 300 (TR) = 19200 observations for young adults, and 50 (valid runs) * 210 (TR) = 10500 observations for older adults. Aside from the older adults' data being slightly insufficient (4.37% less than the suggestion), all other hyperparameter combinations in this study meet the recommended number of observations. 

      (5) In Figure 2, the authors write that the states' spatial maps were normalized for visualization purposes. Could the authors also show visualization of brain states that are not normalized? The reason why I ask is, for example, in Song, Shim, & Rosenberg (2023), the base state was observed which had activity levels all close to the mean (which is 0 because the BOLD activity was normalized). If the activity patterns of this brain state were to be normalized after state estimation, the base state would have looked drastically different than what is reported. 

      We derived the spatial maps of the states using unnormalized activity patterns, with the BOLD signals Z-score normalized to a mean of zero. Under the speech comprehension task, the three states exhibited relatively large fluctuations in network activity levels. The activity ranges were as follows: [-0.71 to 0.51] for State #1, [-0.26 to 0.30] for State #2, and [-0.82 to 0.40] for State #3. For the resting state, we observed a state resembling the baseline state as described in Song, Shim, & Rosenberg (2023), with activity values ranging from -0.133 to 0.09. 

      In the revision, we have replaced the states' spatial maps with versions showing unnormalized activity patterns. 

      (6) In line 297, the authors speculate that "This may be because there is too much heterogeneity among the older adults". To support this speculation, the authors can calculate the overall ISC of brain state dynamics among older adults and compare it to the ISC estimated from younger adults.  

      We analyzed the overall ISC of brain state dynamics, and found the ISC was indeed significantly lower among the older adults than that among the younger adults. We have revised this statement as follows:

      These factors can diminish the inter-subject correlation of brain state dynamics— indeed, ISCs among older adults were significantly lower than those among younger adults (Figure S5)—and reduce ISC's sensitivity to individual differences in task performance (Line 321-326).

      Other comments: 

      (7) In Figure 4, the authors showed a significant positive correlation between head movement ISC with the best performer and comprehension scores. Does the average head movement of all individuals negatively correlate with comprehension scores, given that the authors argue that "greater task engagement is accompanied by decreased movement"? 

      We examined the relationship between participants' average head movement across the comprehension task and their comprehension scores. There was no significant correlation (r = 0.041, p = 0.74). In the literature (e.g. ,Ballenghein et al., 2019) , the relationship between task engagement and head movement was also assessed at the moment-by-moment level, rather than by using time-averaged data.

      Real-time head movements reflect fluctuations in task engagement and cognitive state. In contrast, mean head movement, as a static measure, fails to capture these changes, and thus is not effective in predicting task performance.

      (8) The authors write the older adults sample, the "independent dataset". Technically, however, this dataset cannot be independent because they were collected at the same time by the same research group. I would advise replacing the word independent to something like second dataset or replication dataset. 

      We have replaced the phrase “independent dataset” with “replication dataset”. 

      (9) Pertaining to a paragraph starting in line 586: For non-parametric permutation tests, the authors note that the time courses of brain state expression were "randomly shuffled". How was this random shuffling done: was this circular-shifted randomly, or were the values within the time course literally shuffled? The latter approach, literal shuffling of the values, does not make a fair null distribution because it does not retain temporal regularities (autocorrelation) that are intrinsic to the fMRI signals. Thus, I suggest replacing all non-parametric permutation tests with random circular shifting of the time series (np. roll in python).  

      In the original manuscript, the time course was literally shuffled. In the revised version, we circular-shifted the time course randomly (circshift.m in Matlab) to generate the null distribution. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence (Line 230-235). 

      (10) The p value calculation should be p = (1+#(chance>=observed))/(1+#iterations) for one-tailed test and p = (1+#(abs(chance)>=abs(observed)))/(1+#iterations) for twotailed test. Thus, if 5,000 iterations were run and none of the chances were higher than the actual observation, the p-value is p = 1/5001, which is the minimal value it can achieve. 

      Have corrected. 

      (11) State 3 in Figure S2 does not resemble State 3 of the main result. Could the authors explain why they corresponded State 3 of the Yeo-7 scheme to State 3 of the nineparcellation scheme, perhaps using evidence of spatial overlap? 

      The correspondence of states between the two schemes was established using evidence of state expression time course. 

      To assess temporal overlap, we calculated Pearson’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network parcellation scheme in terms of state expression probabilities. The time courses of the 64 participants were concatenated, resulting in 19200 (300*64) time points for each state. The one that the candidate state most closely resembled was set to be its corresponding state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. As demonstrated in the confusion matrix, each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two schemes.

      We also assessed the spatial overlap between the two schemes. First, a state activity value was assigned to each voxel across the whole brain (including a total of 34,892 voxels covered by both parcellation schemes). This is done for each brain state. Next, we calculated Spearman’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network scheme in terms of whole-brain activities. The pattern of spatial overlap is consistent with the pattern of temporal overlap, such that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively.

      Author response image 1.

      We noted that the networks between the two schemes are not well aligned in their spatial location, especially for the DMN (as shown below). This may lead to the low spatial overlap of State #3, which is dominated by DMN activity. Consequently, establishing state correspondence based on temporal information is more appropriate in this context. We therefore only reported the results of temporal overlap in the manuscript. 

      We have added a paragraph in the main text for “Establishing state correspondence between analyses” (Line 672-699). We have also updated the associated figures (Fig.S2, Fig.S3 and Fig.5)

      Author response image 2.

      (12) Line 839: gamma parameter, on a step size of? 

      (16) Figure 3. Please add a legend in the "Sound envelope" graph what green and blue lines indicate. The authors write Coh(t) and Coh(t, t+1) at the top and Coh(t) and Coh(t+1) at the bottom. Please be consistent with the labeling. Shouldn't they be Coh(t-1, t) and Coh(t, t+1) to be exact for both? 

      Have corrected. 

      (17) In line 226, is this one-sample t-test compared to zero? If so, please write it inside the parentheses. In line 227, the authors write "slightly weaker"; however, since this is not statistically warranted, I suggest removing the word "slightly weaker" and just noting significance in both States 1 and 2.  

      Have corrected.

      (18) In line 288, please fix "we also whether". 

      Have corrected. 

      (19) In Figure 2C, what do pink lines in the transition matrix indicate? Are they colored just to show authors' interests, or do they indicate statistical significance? Please write it in the figure legend.   

      Yes, the pink lines indicate a meaningful trend, showing that the between-state transition probabilities are significantly higher than those in permutation.

      We have added this information to the figure legend. 

      Reviewer #2 (Recommendations for the authors):

      (1) It is unclear how the correspondence between states across different conditions and datasets was computed. Given the spatial autocorrelation of brain maps, I recommend reporting the Dice coefficient along with a spin-test permutation to test for statistical significance.  

      The state correspondence between different conditions and between the two datasets are established using evidence of spatial overlap. The spatial overlap between states was quantified by Pearson’s correlation using the activity values (derived from HMM) of the nine networks. For each candidate state identified in other analyses (for the Rest, MG and older-adult datasets), we calculate the correlation between its network activity pattern and the three predefined states from the main analysis (for the young-adults dataset), and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      For the comparison between the young and older adults’ datasets (as shown below), the largest spatial overlap occurred along the diagonal of the confusion matrix, with high correlation values. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two datasets. As the HMM is modelled at the level of networks which lack accurate coordinates, we did not apply the spin-test to assess the statistical significance of overlap. Instead, we extracted the state activity patterns from the 1000 permutations (wherein the original BOLD time courses were circularly shifted and an HMM was conducted) for the older-adults dataset. Applying the similar state-correspondence strategy, we generated a null distribution of spatial overlap. The real overlap of the three states was greater than and 97.97%, 95.34% and 92.39% instances from the permutation (as shown below). 

      Author response image 3.

      For the comparison of main task with the resting and the incomprehensible speech condition, there was some degree of confusion: there were two candidate states showing the highest similarity to State #2. In this case, we labeled the most similar candidate as State #2. The other candidate was then assigned to the predefined state with which it had the second-highest correlation. We used a prime symbol (e.g., State #3') to denote cases where such confusion occurred. These findings support our conclusion that the tripartite-organization of brain states is not a task-free, intrinsic property.

      When establishing the correspondence between the Yeo-7 network and the ninenetwork parcellation schemes, we primarily relied on evidence from temporal overlap measures, as a clear network-level alignment between the two parcellation schemes is lacking. Temporal overlap was quantified by calculating the correlation of state occurrence probabilities between the two schemes. To achieve this, we concatenated the time courses of 64 participants, resulting in a time series consisting of 19,200 time points (300 time points per participant) for each state. Each of the three candidate states from the Yeo-7 network scheme was best matched to State #1, State #2, and State #3 from the main analyses, respectively. To determine the statistical significance of the temporal overlap, we circular shifted each participant’s time course of state expression obtained from the Yeo-7network scheme for 1000 times. Applying the same strategy to find the matching states, we generated a null distribution of overlap. The real overlap was much higher than the instances from permutation. 

      Author response image 4.

      In the revision, we have provided detailed description for how the state correspondence is established and reported the statistical significance of those correspondence (Line 671-699). The associated figures have also been updated (Fig.5, Fig. S2 and Fig.S3).  

      (2) Please clarify if circle-shifting was applied to the state expression time course when generating the null distribution for behavior-brain state correlations reported in Figure (3). This seems important to control for the temporal autocorrelation in the time courses.  

      We have updated the results by using circle-shifting to generated the null distribution. The results are largely consistent with the previous on without circular shifting (Line 230-242). 

      (3) Figure 3: What does the green shaded area around the sound envelope represent? In the caption, specify whether the red line in the null distributions indicates the mean or median R between brain state expression and narrative features. It would also be beneficial to report this value in the main text. 

      The green shaded area indicated the original amplitude of speech signal, while blue line indicates the smoothed, low-frequency contour of amplitude changes over time (i.e., speech envelope). We have updated the figure and explained this in the figure caption. 

      The red line in the null distributions indicates the R between brain state expression and narrative features for the real data. and reported the mean R of the permutation in the main text. 

      (4) The manuscript is missing a data availability statement (https://elifesciences.org/inside-elife/51839f0a/for-authors-updates-to-elife-s-datasharing-policies). 

      We have added a statement of data availability in the revision, as follows: 

      “The raw and processed fMRI data are available on OpenNeuro: https://openneuro.org/datasets/ds005623. The experimental stimuli, behavioral data and main scripts used in the analyses are provided on Github. ”

      (5) There is a typo in line 102 ("perceptual alalyses"). 

      Have corrected. 

      We sincerely thank the two reviewers for their constructive feedback, thorough review, and the time they dedicated to improving our work.

      Reference: 

      Ahrends, C., Stevner, A., Pervaiz, U., Kringelbach, M. L., Vuust, P., Woolrich, M. W., & Vidaurre, D. (2022). Data and model considerations for estimating timevarying functional connectivity in fMRI. Neuroimage, 252, 119026. 

      Ballenghein, U., Megalakaki, O., & Baccino, T. (2019). Cognitive engagement in emotional text reading: concurrent recordings of eye movements and head motion. Cognition and Emotion. 

      Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the national academy of sciences, 119(6), e2108091119. https://doi.org/10.1073/pnas.2108091119  

      Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences, 19(6), 304-313. 

      Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story [Article]. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011  

      Liu, L., Li, H., Ren, Z., Zhou, Q., Zhang, Y., Lu, C., Qiu, J., Chen, H., & Ding, G. (2022). The “two-brain” approach reveals the active role of task-deactivated default mode network in speech comprehension. Cerebral Cortex, 32(21), 4869-4884. 

      Liu, L., Zhang, Y., Zhou, Q., Garrett, D. D., Lu, C., Chen, A., Qiu, J., & Ding, G. (2020). Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cerebral Cortex, 30(3), 942-951. https://doi.org/10.1093/cercor/bhz138

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Xia et al. investigated the mechanisms underlying Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). The authors observed that abnormal osteogenesis and adipogenesis are associated with decreased β-catenin in the necrotic femoral head of GONFH patients, and that the inhibition of β-catenin signalling leads to abnormal osteogenesis and adipogenesis in GONFH rats. Of interest, the deletion of β-catenin in Col2-expressing cells rather than in Osx-expressing cells leads to a GONFH-like phenotype in the femoral head of mice.

      Strengths:

      A strength of the study is that it sets up a Col2-expressing cell-specific β-catenin knockout mouse model that mimics the full spectrum of osteonecrosis phenotype of GONFH. This is interesting and provides new insights into the understanding of GONFH. Overall, the data are solid and support their conclusions.

      Reviewer #1 (Recommendations For The Authors):

      1) Fig. 1I should be quantified and presented as bar graphs to make it consistent with other data, and the significance should be shown.

      Reply: Thanks for your comments. We have provided the quantitative bar graph in the new version.

      2) Fig. 2H, beta-catenin, ALP and FABP4 should be labled below the X axis. Moreover, the pattern of Fig. 2H is different from other bar graphs and the dots for individual samples are missing, so I could not judge the N values for the experiments. N values should also be provided for Fig. 3.

      Reply: Thanks for your comments. We have added the labels of beta-catenin, ALP and FABP4 below the X axis in Fig. 2H. The modes of quantitative bar graphs were changed to show the N values in the each experiment.

      3) Fig. 4 shows the fate mapping of Col2+ cells and Osx+ cells in the femoral head. In this regard, the authors presented images for Col2-expressing cells at all the indicated time points, i.e. 1, 3, 6, and 9 months, but only presented images for Osx-expressing cells for 1 month while those for 3, 6, and 9 months are missing.

      Reply: Thanks for your comments. Here, we showed that the expression of Osx+ cells in the femoral head were total different with Col2+ cells at the age of 3, 6 month, further indicating they were two different progenitor lineage cells.

      Author response image 1.

      4) Some experiments may need to be described in more detail" e.g., ABH/Orange G staining, biomechanical testing, μCT analysis, et al.

      Reply: Thanks for your comments. We have provided more information of experiment procedures.

      5) This study proposed that Col2-expressing cells play a key role in the progression of GONFH, did the authors use Col2+ cells for the in vitro experiments?

      Reply: As in vitro experiments could not reflect the location of Col2-expressing cells in the femoral head, therefore here we applied in vivo lineage tracing study. After as long as 9 month of linage trace, we thoroughly showed the self-renew ability and osteogenic commitment of Col2+ cells, as well as its space variation in the femoral head with age. Conditional knockout of β-catenin caused that Col2+ cells trans-differentiated into adipogenic cells instead of osteogenic cells, which directly clarified the mechanism of Col2+ cells leading to GONFH-like phenotype in mice.

      6) A few typo errors, such as Line 13, "contribute" should be "contributes"; Line 118, "reveled" should be "revealed".

      Reply: We have revised the grammar errors in the new manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported a study to uncover that β-catenin inhibition disrupting the homeostasis of osteogenic/adipogenic differentiation contributes to the development of Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). In this study, they first observed abnormal osteogenesis and adipogenesis associated with decreased β-catenin in the necrotic femoral head of GONFH patients, but the exact pathological mechanisms of GONFH remain unknown. They then performed in vivo and in vitro studies to further reveal that glucocorticoid exposure disrupted osteogenic/adipogenic differentiation of bone marrow stromal cells (BMSCs) by inhibiting β-catenin signaling in glucocorticoid-induced GONFH rats, and specific deletion of β-catenin in Col2+ cells shifted BMSCs commitment from osteoblasts to adipocytes, leading to a full spectrum of disease phenotype of GONFH in adult mice.

      Strengths:

      This innovative study provides strong evidence supporting that β-catenin inhibition disrupts the homeostasis of osteogenic/adipogenic differentiation that contributes to the development of GONFH. This study also identifies an ideal genetically modified mouse model of GONFH. Overall, the experiment is logically designed, the figures are clear, and the data generated from humans and animals is abundant supporting their conclusions.

      Weaknesses:

      There is a lack of discussion to explain how the Wnt agonist 1 works. There are several types of Wnt ligands. It is not clear if this agonist only targets Wnt1 or other Wnts as well. Also, why Wnt agonist 1 couldn't rescue the GONFH-like phenotype in β-cateninCol2ER mice needs to be discussed.

      Reply: Thanks for your constructive comments. Wnt agonist 1 is a cell-permeating activator of the Wnt signaling pathway that induces transcriptional activity dependent on β-catenin (PMID: 25514428,18624906). In the present study, we aim to demonstrate that activation of β-catenin signaling could alleviate the phenotype of rat GONFH, thus only β-catenin and downstream targets (RUNX2, ALP, PPAR-γ, FABP4) expressions were detected after Wnt agonist 1 intervention. Conditional knockout β-catenin in Col2+ cells lead to an mouse GONFH-like phenotype. Wnt agonist 1 couldn't rescue this GONFH-like, as it did not activate β-catenin signaling. We have discussed them in the new version.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors are trying to delineate the mechanism underlying the osteonecrosis of the femoral head.

      Strengths:

      The authors provided compelling in vivo and in vitro data to demonstrate Col2+ cells and Osx+ cells were differentially expressed in the femoral head. Moreover, inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype including fat accumulation, subchondral bone destruction, and femoral head collapse, indicating that imbalance of osteogenic/adipogenic differentiation of Col2+ cells plays an important role in GONFH pathogenesis. Therefore, this manuscript provided mechanistic insights into osteonecrosis as well as potential therapeutic targets for disease treatment.

      Weaknesses:

      However, additional in-depth discussion regarding the phenotype observed in mice is highly encouraged.

      Reply: Thanks for your comments. Inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype. Lineage tracing data showed Col2+ cells and Osx+ cells were different cell populations, and we have discussed the potential mechanism caused the different phenotypes between β-cateninCol2ER mice and β-cateninOsxER mice.

      1) Why did the authors use dexamethasone in the cellular experiments but methylprednisolone to induce the GONFH rat model?

      Reply: Thanks for the comments. Here, we applied a dexamethasone (DEX)-treated BMSC model in vitro and a methylprednisolone (MPS)-induced rat model in vivo for GONFH study based on the published literatures (PMID: 37317020, 29662787, 29512684,35126710, 32835568).

      2) Both bone damage and fat accumulation were observed in 3-month-old and 6-month-old β-cateninCol2ER mice, but the femoral head collapse (the feature of GONFH at the late stage) only occurred in the older β-catenin Col2ER mice. This interesting observation needs to be discussed. Reply: Thanks for the comments. Bone damage caused a poor mechanical support is the key to femoral head collapse. Despite of similar trabecular bone loss and fat accumulation in the 3-month-old and 6-month-old β-cateninCol2ER mice, the older mice also presented extensive subchondral bone destruction. Integrated subchondral bone provided a well mechanical support for femoral head morphology, therefore femoral head collapse were occurred in the older β-cateninCol2ER mice.

      3) In the Materials and Methods, detailed information on the reagents should be provided.

      Reply: We have provided detailed information of the important reagents.

      4) As shown in Figure 4, β-cateninOsxER mice at 3 months of age did not show differences in lipid droplet area and empty lacunae rate, but there was a decrease in bone area. The authors should at least provide some necessary discussion of this phenomenon.

      Reply: Thanks for your comments. In the present study, we found few lipid droplet and empty lacuna but a significant decrease of bone mass in the femoral heads of β-cateninOsxER mice. Previous studies showed that specific knockout of β-catenin in Osx-expressing cells promoted osteoclast formation and activity, leading to the bone mass loss (PMID: 29124436, 34973494). We discussed this phenomenon in the new version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas. 

      We thank this reviewer for appreciating the quality of our spatial data. We do not know what caused the technical problem (grayscale version of PDF) for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells. 

      The small size of an individual fly is one of the most challenging aspects of performing spatial transcriptomics. While the resolution of Molecular Cartography is rather high (< 200 nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with the current imaging techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or other super-resolution techniques will be required. 

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and datarich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      (1) The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISHbased spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      We thank the reviewer for this comment, as it reminded us that we need to be clearer in the text, about how we chose the genes to investigate. The statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head, we show this now in the new Figure 1 – figure fupplement 1B, D). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardiac cells (26%)). 

      (2) Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      (3) The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      (4) Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      (1) Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we have provided an additional supplementary table with a more detailed description of the head sections (Table S3). We have added the number of animals (12 for the head sections, mixed sex; and 1 male for the body sections) to the main text. We would like to point out that we verified the specificity of our MC method on all the 5 body sections (Figure 2A, TpnC4 & Act88F and text) and not only on one. Furthermore, we also would like to state that the idea of “a Rosetta stone” was mentioned as a future prospect that clearly goes beyond our presented work. We have rewritten the discussion to make this clearer and to any avoid overstatements.

      (2) Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used low expressed genes like salm, CG32121, tinman (body) or sens (head). This is now shown in new Figure 1 – figure Supplement 1B, D. This shows that our method is more sensitive than single-cell data, as all cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method cannot resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight into this by designing isoform-specific probes.

      (3) Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful. 

      High-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) for spatial transcriptomics uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images, as seen in our images, too. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, bioR xiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for specific detection of TpnC4 and Act88F (99.4 and 99.8%).

      (4) The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      We can compare the numbers, but the different methodologies make the interpretation of such a comparison difficult. FCA used single nucleus sequencing, so only nuclear pre-mRNAs are detected. The total amount of counts per single cell sample strongly depends on how many cells were sequenced in an experiment. MC detects all mRNAs present in the section. Here, the size of the sample and hence the size or the number of cells analyzed determines how many mRNAs are detected. In Author response image 1, we have compared our MC results versus FCA data, comparing the genes investigated here in MC per section vs per sequencing experiment. Numbers for MC are slightly lower for the brain (not all cell types are on all sections) and much higher for the larger body samples. However, we feel a direct comparison is questionable, so we prefer to not include this figure in our manuscript.

      Author response image 1:

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (MC, Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      (5) Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006), Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326:

      287–299.). We have present these new data in new Figure 2 – figure supplement 1.

      b.The authors show interesting localization patterns in muscle tissue for different sarcomere proteincoding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes? 

      We thank the reviewer for the interest in the localization patterns in muscle tissue. We show that Act88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only), giving us confidence in the specificity of the MC method. Following the suggestion of the reviewer, we have adapted an HCR-FISH method to Drosophila adult body sections for the revised version of the manuscript. Using this method, we were able to confirm the higher expression/localization of sls transcripts to and around the adult flight muscle nuclei, with an enrichment in nuclei close to the muscle-tendon attachment sites (new Figure 4D-F and new Figure 4 – figure supplement 1). We have also been able to confirm some complementarity in the localization patterns of Act88F and TpnC4 in longitudinal stripes in adult flight muscles, however for Mhc we could not confirm this pattern with HCR-FISH (new Figure 5C-F and new Figure 5 – figure supplement 1). While we could confirm most of the pattern seen, we do not know the exact reason for the slight discrepancies. Thus, we now recommend that insights found with SRT should be confirmed with more classical FISH methods.

      (6) The authors developed an unbiased method to identify "new cell types" which relies on coexpression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      The term “new cell types” only appeared in the old title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we show where unannotated/uncharacterized clusters from the scRNA-seq atlas are located, based on their gene expression. Therefore, we have updated the title in the revised version (Spatial transcriptomics in the adult Drosophila brain and body) and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims. 

      We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene coexpression and expression patterns. Although obtaining sections from more animals would be valuable, we do not believe it to be necessary for our current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would very likely produce similar results as we already show. Following the reviewer’s suggestion, we have tested several genes with HCR-FISH and could readily confirm the localization pattern of sls mRNA close to the terminal nuclei of the flight muscles. This pattern is likely due to a higher expression of sls in these nuclei, as a large amount of sls mRNA signal is detected within the nuclei (Figure 4). A detailed dissection of the mechanism that establishes this pattern is beyond the scope of this manuscript, which is the first one on applying spatial transcriptomics to adult Drosophila.

      The usage of the term “new cell types” was indeed ambiguous and we removed this from the revised version. We now clarified that we map the spatial location of unannotated clusters in the brain. This may or may not include uncharacterized cell types. We now further specify that we have only inferred the location of the nuclei; thus, neuronal function or the location of their axonal processes are still unknown. As such, our data provides a starting point to identify uncharacterized cell types, since their marker genes and nuclear location are now determined. The next step to identify “new cell types” would indeed be to acquire genetic access to these cell types and characterize them in more detail. This is beyond the scope of this manuscript, and therefore we have toned down the title in the revised version and thank the reviewer for this valuable suggestion. 

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      We thank this reviewer for appreciating the impact of our findings and approach to the Drosophila field and beyond. We here provide the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. For a small number of genes, we have confirmed the mRNA patterns using HCR-FISH in the revised version of this manuscript. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) All figures in the manuscript were in grayscale, which made it difficult to interpret the results because the data could only be interpreted by distinguishing different colors to visualize different transcripts. This is likely a technical problem. The manuscript must contain colored images.

      We apologize to the reviewer for this technical issue. The manuscript was uploaded in color to bioRxiv and to eLife. We therefore do not understand to reason for this problem. We are surprised that this issue was not resolved in the reviewers’ discussion since color is obviously essential to appreciate the beauty of this manuscript.

      (2) In Figure 2a, the authors comment on the subcellular localization of trypsin isoforms, but the figure does not indicate the cell borders or the apical and basal regions of the cell. These must be indicated in the figure to help readers understand the results. 

      We thank the reviewer for pointing this out; we have now indicated the outlines of the single-cell layer epithelium on the figure. While we have no marker for cell borders, we have a nuclear marker showing that it is a single cell layer. We hope this allows the reader to appreciate the subcellular localization of the trypsin isoforms.

      (3) All figures (including the data on the authors' website) contain background staining, which I assume is labeling nuclei. This is not indicated in the manuscript, and should be clarified.

      We again thank the reviewer for pointing this out; the background staining indeed labels nuclei (using DAPI). We have indicated this better in the revised version.

      (4) In Figure 5c, the authors claim that neuronal and muscular genes are grouped into the same cluster, but they do not indicate which transcripts are neuronal and which ones are muscular. This must be indicated in the figure.

      We thank the reviewer for this comment. Indeed, there was only one gene, acj6, present in the muscle cluster. So, we decided to delete this statement in the revised version.

      (5) The authors utilized and compared three different approaches to integrate single nuclei sequencing data from the Fly Cell Atlas to their spatially resolved transcriptomics (SRT) data. I was wondering if it is possible to generate a virtual expression explorer using this integrated data, similar to the dataset published in the 2017 Science article by Karaiskos et al., where they combined publicly available in situ hybridization data of fly embryos and their single-cell sequencing data. This virtual expression explorer would be useful to visualize the expression pattern of transcripts that the authors of this paper did not use for their SRT.

      We thank the reviewer for this interesting comment. Using Tangram, we indeed infer gene expression for all genes from the Fly Cell Atlas. To make this visible we have created a Scope session (https://scope.aertslab.org/#/Spatial_Fly/*/welcome), with which users can browse inferred gene expression levels (note that this is on a segmented cell level). We do notice that the inferred gene expression levels contain many false positives and should therefore be used with caution. The spatial data themselves can be browsed through the spatial portal at https://spatialfly.aertslab.org/ .

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses:

      The authors have used a new high throughput approach to examine the location of 150 RNAs in adult Drosophila heads or one body. It is unclear whether the fixation/repeated imaging etc is accurately reflecting the patterns of expression in vivo. The authors should confirm these data using low throughput established techniques for the RNA patterns in muscle for example.

      The authors should clarify their experimental approaches and include additional samples if they indeed want to establish the rosetta stone of fly adults. These data are from only a male fly (and as such is not a complete analysis of the adult fly). To be a map of the adult fly, data from both sexes need to be included.

      Unless functional data that complement the descriptive data shown here are included, the authors have to soften their conclusions. For example, while spatial transcriptomics has mapped RNA expression to a location, without some functional data, it is difficult to conclude that these are indeed "new cell types". Same with the RNA localization principles.

      Recommendations for improving the writing and presentation:

      (1) The manuscript should be heavily revised: in many places, important details are left out or should be moved from the methods to the main text. In addition, the authors often overstate their findings throughout the manuscript. As an example, it appears that the data presented is only from 1 fly, so this doesn't increase the reader's confidence in the data or the applicability of the approach. Also, it isn't clear how many flies were analyzed for the heads (one male fly too?) nor what variability is present from fly to fly. For the approach and data to be used by others, this is important to know.

      We moved some text from the methods section to the main text to be clearer. We now also state how many animals were used for the MC method. While the data for the body has been generated from 1 male only, the data for the head was generated from 12 flies; for both cases, similar slices show very similar gene expression patterns. Furthermore, in the body we used widely known and published marker genes that all showed expected expression patterns, indicating robustness. We agree that this is not a full spatial atlas of the fly, this was also not our goal and we have removed such general statements from the revised version: we aimed to generate a spatial transcriptomics dataset, covering the entire fly (head and body) as a proof-of-principle, tackling data generation and analysis, and highlighting challenges in both.

      (2) The grammar and word choice throughout are challenging often making the text difficult to follow. This reads like an early draft of the paper.

      We apologize to the reviewer for any difficulties. We have revised the text and hope it is now easier to read, while still being accurate on the technical details of the various methods used in our manuscript.

      Minor corrections to the text and figures.

      See the weaknesses mentioned above. Also:

      Figure S1 is unreadable.

      There is no simple way to describe the expression values of 100 genes in 100 cell types on a single page. The resolution of the PDF is high enough that after zooming in, all the information can be read easily.

      Figure S2, in a, please include the axes so that the reader can better understand the sections shown.

      In b, it is unclear what the pink boxes mean. In c, the labels are barely legible.

      In Figure 1 – figure supplement 2 (head sections), we have ordered the head sections from anterior to posterior. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have increased the font size in (C).

      Figure S3, in a, please include axes. In b, the meaning of the pink box

      In Figure 1 – figure supplement 3 (the body sections) we have added the anterior to posterior and dorso-ventral axis, and ordered the sections that stem from the same animal. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have added an explanation to the figure legend.  

      Figure S4, the text in the axes of the heatmap should have a darker typeface

      We have changed it to black, thanks.

      Figure S5c, are the colors in the dendrogram supposed to match the spatial location on the right?

      The purple of the muscles is barely visible.

      Yes, they do match. Colors were modified in the revised version for better visibility.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses. (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Line numbers are missing.

      Added

      (2) VR classroom. Was this a completely custom design based on Unity, or was this developed on top of some pre-existing code? Many aspects of the VR classroom scenario are only introduced (e.g., how was the lip-speech synchronisation done exactly?). Additional detail is required. Also, is or will the experiment code be shared publicly with appropriate documentation? It would also be useful to share brief example video-clips.

      We have added details about the VR classroom programming to the methods section (p. 6-7), and we have now included a video-example as supplementary material.

      “Development and programming of the VR classroom were done primarily in-house, using assets (avatars and environment) were sourced from pre-existing databases. The classroom environment was adapted from assets provided by Tirgames on TurboSquid (https://www.turbosquid.com/Search/Artists/Tirgames) and modified to meet the experimental needs. The avatars and their basic animations were sourced from the Mixamo library, which at the time of development supported legacy avatars with facial blendshapes (this functionality is no longer available in current versions of Mixamo). A brief video example of the VR classroom is available at: https://osf.io/rf6t8.

      “To achieve realistic lip-speech synchronization, the teacher’s lip movements were controlled by the temporal envelope of the speech, adjusting both timing and mouth size dynamically. His body motions were animated using natural talking gestures.”

      While we do intent to make the dataset publicly available for other researchers, at this point we are not making the code for the VR classroom public. However, we are happy to share it on an individual-basis with other researchers who might find it useful for their own research in the future.

      (3) "normalized to the same loudness level using the software Audacity". Please specify the Audacity function and parameters.

      We have added these details (p.7)

      “All sound-events were normalized to the same loudness level using the Normalize function in the audio-editing software Audacity (theaudacityteam.org, ver 3.4), with the peak amplitude parameter set to -5 dB, and trimmed to a duration of 300 milliseconds.“

      (4) Did the authors check if the participants were already familiar with some of the content in the mini-lectures?

      This is a good point. Since the mini-lectures spanned many different topics, we did not pre-screen participants for familiarity with the topics, and it is possible that some of the participants had some pre-existing knowledge.

      In hindsight, it would have been good to have added some reflective questions regarding participants prior knowledge as well as other questions such as level of interest in the topic and/or how well they understood the content. These are elements that we hope to include in future versions of the VR classroom.

      (5) "Independent Component Analysis (ICA) was then used to further remove components associated with horizontal or vertical eye movements and heartbeats". Please specify how this selection was carried out.

      Selection of ICA components was done manually based on visual inspection of their time-course patterns and topographical distributions, to identify components characteristic of blinks, horizontal eye-movements and heartbeats). Examples of these distinct components are provided in Author response image 1 below. These is now specified in the methods section.

      Author response image 1.

      (6) "EEG data was further bandpass filtered between 0.8 and 20 Hz". If I understand correctly, the data was filtered a second time. If that's the case, please do not do that, as that will introduce additional and unnecessary filtering artifacts. Instead, the authors should replace the original filter with this one (so, filtering the data only once). Please see de Cheveigne and Nelkn, Neuron, 2019 for an explanation. Also, please provide an explanation of the rationale for further restricting the cut-off bands in the methods section. Finally, further details on the filters should be included (filter type and order, for example).

      Yes, the data was indeed filtered twice. The first filter is done as part of the preprocessing procedure, in order to remove extremely high- and low- frequency noise but retain most activity within the range of “neural” activity. This broad range is mostly important for the ICA procedure, so as to adequately separate between ocular and neural contribution to the recorded signal.

      However, since both the speech tracking responses and ERPs are typically less broadband and are comprised mostly of lower frequencies (e.g., those that make up the speech-envelope), a second narrower filter was applied to improve TRF model-fit and make ERPs more interpretable.

      In both cases we used a fourth order zero-phase Butterworth IIR filter with 1-seconds of padding, as implemented in the Fieldtrip toolbox. We have added these details to the manuscript.

      (7) "(~ 5 minutes of data in total), which is insufficient for deriving reliable TRFs". That is a bit pessimistic and vague. What does "reliable" mean? I would tend to agree when talking about individual subject TRFs, which 5 min per participant can be enough at the group level. Also, this depends on the specific speech material. If the features are univariate or multivariate. Etc. Please narrow down and clarify this statement.

      We determined that the data in the Quiet condition (~5 min) was insufficient for performing reliable TRF analysis, by assessing whether its predictive-power was significantly better than chance. As shown in Author response image 2 below, the predictive power achieved using this data was not higher than values obtained in permuted data (p = 0.43). Therefore, we did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      Author response image 2.

      (8) "Based on previous research in by our group (Kaufman & Zion Golumbic 2023), we chose to use a constant regularization ridge parameter (λ= 100) for all participants and conditions". This is an insufficient explanation. I understand that there is a previous paper involved. However, such an unconventional choice that goes against the original definition and typical use of these methods should be clearly reported in this manuscript.

      We apologize for not clarifying this point sufficiently, and have added an explanation of this methodological choice (p.11):

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Assuming that the explanation will be sufficiently convincing, which is not a trivial case to make, the next issue that I will bring up is that the lambda value depends on the magnitude of input and output vectors. While the input features are normalised, I don't see that described for the EEG signals. So I assume they are not normalized. In that case, the lambda would have at least to be adapted between subjects to account for their different magnitude.

      We apologize for omitting this detail – yes, the EEG signals were normalized prior to conducting the TRF analysis. We have updated the methods section to explicitly state this pre-processing step (p.10).

      Another clarification, is that value (i.e., 100) would not be comparable either across subjects or across studies. But maybe the authors have a simple explanation for that choice? (note that this point is very important as this could lead others to use TRF methods in an inappropriate way - but I understand that the authors might have specific reasons to do so here). Note that, if the issue is finding a reliable lambda per subject, a more reasonable choice would be to use a fixed lambda selected on a generic (i.e., group-level) model. However selecting an arbitrary lambda could be problematic (e.g., would the results replicate with another lambda; and similarly, what if a different EEG system was used, with different overall magnitude, hence the different impact of the regularisation).

      We fully agree that selecting an arbitrary lambda is problematic (esp across studies). As clarified above, the group-level lambda chosen here for the encoding more was data-driven, optimized based on group-level predictive power.

      (9) "L2 regularization of the model, to reduce its complexity". Could the authors explain what "reduce its complexity" refers to?

      Our intension here was to state that the L2 regularization constrains the model’s weights so that it can better generalize between to left-out data. However, for clarity we have now removed this statement.

      (10) The same lambda value was used for the decoding model. From personal experience, that is very unlikely to be the optimal selection. Decoding models typically require a different (usually larger) lambda than forward models, which can be due to different reasons (different SNR of "input" of the model and, crucially, very different dimensionality).

      We agree with the reviewer that treatment of regularization parameters might not be identical for encoding and decoding models. Our initial search of lambda parameters was limited to λ= 0.01 - 100, with λ= 100 showing the best reconstruction correlations. However, following the reviewer’s suggestion we have now broadened the range and found that, in fact reconstruction correlations are further improved and the best lambda is λ= 1000 (see Author response image 3 below, left panel). Importantly, the difference in decoding reconstruction correlations between the groups is maintained regardless of the choice of lambda (although the effect-size varies; see Author response image 3, right panel). We have now updated the text to reflect results of the model with λ= 1000.

      Author response image 3.

      (11) Skin conductance analysis. Additional details are required. For example, how was the linear interpolation done exactly? The raw data was downsampled, sure. But was an anti-aliasing filter applied? What filter exactly? What implementation for the CDA was run exactly?

      We have added the following details to the methods section (p. 14):

      “The Skin Conductance (SC) signal was analyzed using the Ledalab MATLAB toolbox (version 3.4.9; Benedek and Kaernbach, 2010; http://www.ledalab.de/) and custom-written scripts. The raw data was downsampled to 16Hz using FieldTrip's ft_resampledata function, which applies a built-in anti-aliasing low-pass filter to prevent aliasing artifacts. Data were inspected manually for any noticeable artifacts (large ‘jumps’), and if present were corrected using linear interpolation in Ledalab. A continuous decomposition analysis (CDA) was employed to separate the tonic and phasic SC responses for each participant. The CDA was conducted using the 'sdeco' mode (signal decomposition), which iteratively optimizes the separation of tonic and phasic components using the default regularization settings.”

      (12) "N1- and P2 peaks of the speech tracking response". Have the authors considered using the N1-P2 complex rather than the two peaks separately? Just a thought.

      This is an interesting suggestion, and we know that this has been used sometimes in more traditional ERP literature. In this case, since neither peak was modulated across groups, we did not think this would yield different results. However, it is a good point to keep in mind for future work.

      (13) Figure 4B. The ticks are missing. From what I can see (but it's hard without the ticks), the N1 seems later than in other speech-EEG tracking experiments (where is closer to ~80ms). Could the authors comment on that? Or maybe this looks similar to some of the authors' previous work?

      We apologize for this and have added ticks to the figure.

      In terms of time-course, a N1 peak at around 100ms is compatible with many of our previous studies, as well as those from other groups.

      (14) Figure 4C. Strange thin vertical grey bar to remove.

      Fixed.

      (15) Figure 4B: What about the topographies for the TRF weights? Could the authors show that for the main components?

      Yes. The topographies of the main TRF components are similar to those of the predictive power and are compatible with auditory responses. We have added them to Figure 4B.

      (16) Figure 4B: I just noticed that this is a grand average TRF. That is ok (but not ideal) only because the referencing is to the mastoids. The more appropriate way of doing this is to look at the GFP, instead, which estimates the presence of dipoles. And then look at topographies of the components. Averaging across channels makes the plotted TRF weaker and noisier. I suggest adding the GFP to the plot. Also, the colour scale in Figure 4A is deceiving, as blue is usually used for +/- in plots of the weights. While that is a heatmap, where using a single colour or even yellow to red would be less deceiving at first look. Only cosmetics, indeed. The result is interesting nonetheless!

      We apologize for this, and agree with the reviewer that it is better not to average across EEG channels. In the revised Figure, we now show the TRFs based on the average of electrodes FC1, FC2, and FCz, which exhibited the strongest activity for the two main components.

      Following the previous comment, we have also included the topographical representation of the TRF main components, to give readers a whole-head perspective of the TRF.

      We have also fixed the color-scales.

      We are glad that the reviewer finds this result interesting!

      (17) Figure 4C. This looks like a missed opportunity. That metric shows a significant difference overall. But is that underpinned but a generally lower envelope reconstruction correlation, or by a larger deviation in those correlations (so, that metric is as for the control in some moments, but it drops more frequently due to distractibility)?

      We understand the reviewer’s point here, and ideally would like to be able to address this in a more fine-grained analysis, for example on a trial-by-trial basis. However, the design of the current experiment was not optimized for this, in terms of (for example) number of trials, the distribution of sound-events and behavioral outcomes. We hope to be able to address this issue in our future research.

      (18) I am not a fan of the term "accuracy" for indicating envelope reconstruction correlations. Accuracy is a term typically associated with classification. Regression models are typically measured through errors, loss, and sometimes correlations. 'Accuracy' is inaccurate (no joke intended).

      We accept this comment and now used the term “reconstruction correlation”.

      (19) Discussion. "The most robust finding in". I suggest using more precise terminology. For example, "largest effect-size".

      We agree and have changed the terminology (p. 31).

      (20) "individuals who exhibited higher alpha-power [...]". I probably missed this. But could the authors clarify this result? From what I can see, alpha did not show an effect on the group. Is this referring to Table 2? Could the authors elaborate on that? How does that reconcile with the non-significant effect of the group? In that same sentence, do you mean "and were more likely"? If that's the case, and they were more likely to report attentional difficulties, how is it that there is no group-effect when studying alpha?

      Yes, this sentence refers to the linear regression models described in Figure 10 and in Table 2. As the reviewer correctly points out, this is one place where there is a discrepancy between the results of the between-group analysis (ADHD diagnosis yes/no) and the regression analysis, which treats ADHD symptoms as a continuum, across both groups. The same is true for the gaze-shift data, which also did not show a significance between-group effect but was identified in the regression analysis as contributing to explaining the variance in ADHD symptoms.

      We discuss this point on pages 30-31, noting that “although the two groups are clearly separable from each other, they are far from uniform in the severity of symptoms experienced”, which motivated the inclusion of both analyses in this paper.

      At the bottom of p. 31 we specifically address the similarities and differences between the between-group and regression-based results. In our opinion, this pattern emphasizes that while neither approach is ‘conclusive’, looking at the data through both lenses contributes to an overall better understanding of the contributing factors, as well as highlighting that “no single neurophysiological measure alone is sufficient for explaining differences between the individuals – whether through the lens of clinical diagnosis or through report of symptoms”.

      (21) "why in the latter case the neural speech-decoding accuracy did not contribute to explaining ASRS scores [...]". My previous point 1 on separating overall envelope decoding from its deviation could help there. The envelope decoding correlation might go up and down due to SNR, while you might be more interested in the dynamics over time (i.e., looking at the reconstructions over time).

      Again, we appreciate this comment, but believe that this additional analysis is outside the scope of what would be reliably-feasible with the current dataset. However, since the data will be made publicly available, perhaps other researchers will have better ideas as to how to do this.

      (22) Data and code sharing should be discussed. Also, specific links/names and version numbers should be included for the various libraries used.

      We are currently working on organizing the data to make it publicly available on the Open Science Project.

      We have updated links and version numbers for the various toolboxes/software used, throughout the manuscript.

      Reviewer #2:

      (1) While it is highly appreciated to study selective attention in a naturalistic context, the readers would expect to see whether there are any potential similarities or differences in the cognitive and neural mechanisms between contexts. Whether the classic findings about selective attention would be challenged, rebutted, or confirmed? Whether we should expect any novel findings in such a novel context? Moreover, there are some studies on selective attention in the naturalistic context though not in the classroom, it would be better to formulate specific hypotheses based on previous findings both in the strictly controlled and naturalistic contexts.

      Yes, we fully agree that comparing results across different contexts would be extremely beneficial and important.

      The current paper serves as an important proof-first-concept demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility, but is also the basis for formulating specific hypothesis that will be tested in follow-up studies.

      If fact, a follow up study is already ongoing in our lab, where we are looking into this point, by testing users in different VR scenarios (e.g., classroom, café, office etc.), and assessing whether similar neurophysiological patterns are observed across contexts and to what degree they are replicable within and across individuals. We hope to share these data with the community in the near future.

      (2) Previous studies suggest handedness and hemispheric dominance might impact the processing of information in each hemisphere. Whether these issues have been taken into consideration and appropriately addressed?

      This is an interesting point. In this study we did not specifically control for handedness/hemispheric dominance, since most of the neurophysiological measured used here are sensory/auditory in their nature, and therefore potentially invariant to handedness. Moreover, the EEG signal is typically not very sensitive to hemispheric dominance, at least for the measures used here. However, this might be something to consider more explicitly in future studies. Nonetheless, we have added handedness information to the Methods section (p. 5): “46 right-handed, 3 left-handed”

      (3) It would be interesting to know how students felt about the Virtual Classroom context, whether it is indeed close to the real classroom or to some extent different.

      Yes, we agree. Obviously, the VR classroom differs in many ways from a real classroom, in terms of the perceptual experience, social aspects and interactive possibilities. We did ask participants about their VR experience after the experiment, and most reported feeling highly immersed in the VR environment and engaged in the task, with a strong sense of presence in the virtual-classroom.

      We note that, in parallel to the VR studies in our lab, we are also conducting experiments in real classrooms, and we hope that the cross-study comparison will be able to shed more light on these similarities/differences.

      (4) One intriguing issue is whether neural tracking of the teacher's speech can index students' attention, as the tracking of speech may be relevant to various factors such as sound processing without semantic access.

      Another excellent point. While separating the ‘acoustic’ and ‘semantic’ contributions to the speech tracking response is non-trivial, we are currently working on methodological approaches to do this (again, in future studies) following, for example, the hierarchical TRF approach used by Brodbeck et al. and others.

      (5) There are many results associated with various metrics, and many results did not show a significant difference between the ADHD group and the control group. It is difficult to find the crucial information that supports the conclusion. I suggest the authors reorganize the results section and report the significant results first, and to which comparison(s) the readers should pay attention.

      We apologize if the organization of the results section was difficult to follow. This is indeed a challenge when collecting so many different neurophysiological metrics.

      To facilitate this, we have now added a paragraph at the beginning of the result section, clarifying its structure (p.16):

      The current dataset is extremely rich, consisting of many different behavioral, neural and physiological responses. In reporting these results, we have separated between metrics that are associated with paying attention to the teacher (behavioral performance, neural tracking of the teacher’s speech, and looking at the teacher), those capturing responses to the irrelevant sound-events (ERPs and event-related changes in SC and gaze); as well as more global neurophysiological measures that may be associated with the listeners’ overall ‘state’ of attention or arousal (alpha- and beta-power and tonic SC).

      Moreover, within each section we have ordered the analysis such that the ones with significant effects are first. We hope that this contributes to the clarity of the results section.

      (6) The difference between artificial and non-verbal humans should be introduced earlier in the introduction and let the readers know what should be expected and why.

      We have added this to the Introduction (p. 4)

      (7) It would be better to discuss the results against a theoretical background rather than majorly focusing on technical aspects.

      We appreciate this comment. In our opinion, the discussion does contain a substantial theoretical component, both regarding theories of attention and attention-deficits, and also regarding their potential neural correlates. However, we agree that there is always room for more in depth discussion.

      Reviewer #3:

      Major:

      (1) While the study introduced a well-designed experiment with comprehensive physiological measures and thorough analyses, the key insights derived from the experiment are unclear. For example, does the high ecological validity provide a more sensitive biomarker or a new physiological measure of attention deficit compared to previous studies? Or does the study shed light on new mechanisms of attention deficit, such as the simultaneous presence of inattention and distraction (as mentioned in the Conclusion)? The authors should clearly articulate their contributions.

      Thanks for this comment.

      We would not say that this paper is able to provide a ‘more sensitive biomarker’ or a ‘new physiological measure of attention’ – in order to make those type of grand statements we would need to have much more converging evidence from multiple studies and using both replication and generalization approaches.

      Rather, from our perspective, the key contribution of this work is in broadening the scope of research regarding the neurophysiological mechanisms involved in attention and distraction.

      Specifically, this work:

      (1) Offers a significant methodological advancement of the field – demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility in contexts that ‘mimic’ real-life situations (rather than highly controlled computerized tasks).

      (2) Provides a solid basis formulating specific mechanistic hypothesis regarding the neurophysiological metrics associated with attention and distraction, the interplay between them, and their potential relation to ADHD-symptoms. Rather than being an end-point, we see these results as a start-point for future studies that emphasize ecological validity and generalizability across contexts, that will hopefully lead to improved mechanisms understanding and potential biomarkers of real-life attentional capabilities (see also response to Rev #2 comment #1 above).

      (3) Highlights differences and similarities between the current results and those obtained in traditional ‘highly controlled’ studies of attention (e.g., in the way ERPs to sound-events differ between ADHD and controls; variability in gaze and alpha-power; and more broadly about whether ADHD symptoms do or don’t map onto specific neurophysiological metrics). Again, we do not claim to give a definitive ’answer’ to these issues, but rather to provide a new type of data that can expands the conversation and address the ecological validity gap in attention research.

      (2) Based on the multivariate analyses, ASRS scores correlate better with the physiological measures rather than the binary deficit category. It may be worthwhile to report the correlation between physiological measures and ASRS scores for the univariate analyses. Additionally, the correlation between physiological measures and behavioral accuracy might also be interesting.

      Thanks for this. The beta-values reported for the regression analysis reflect the correlations between the different physiological measures and the ASRS scores (p. 30). From a statistical perspective, it is better to report these values rather than the univariate correlation-coefficients, since these represent the ‘unique’ relationship with each factor, after controlling for all the others.

      The univariate correlations between the physiological measures themselves, as well as with behavioral accuracy, are reported in Figure 10

      (3) For the TRF and decoding analysis, the authors used a constant regularization parameter per a previous study. However, the optimal regularization parameter is data-dependent and may differ between encoding and decoding analyses. Furthermore, the authors did not conduct TRF analysis for the quiet condition due to the limited ~5 minutes of data. However, such a data duration is generally sufficient to derive a stable TRF with significant predictive power (Mesik and Wojtczak, 2023).

      The reviewer raises two important points, also raised by Rev #1 (see above).

      Regarding the choice of regularization parameters, we have now clarified that although we used a common lambda value for all participants, it was selected in a data-driven manner, so as to achieve an optimal predictive power at the group-level.

      See revised methods section:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Regarding whether data was sufficient in the Quiet condition for performing TRF analysis – we are aware of the important work by Mesik & Wojtczak, and had initially used this estimate when designing our study. However, when assessing the predictive-power of the TRF model trained on data from the Quiet condition, we found that it was not significantly better than chance (see Author response image 2, ‘real’ predictive power vs. permuted data). Therefore, we ultimately did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      (4) As shown in Figure 4, for ADHD participants, decoding accuracy appears to be lower than the predictive power of TRF. This result is surprising because more data (i.e., data from all electrodes) is used in the decoding analysis.

      This is an interesting point – however, in our experience it is not necessarily the case that decoding accuracy (i.e., reconstruction correlation with the stimulus) is higher than encoding predictive-power. While both metrics use Pearson’s’ correlations, they quantify the similarity between two different types of signals (the EEG and the speech-envelope). Although the decoding procedure does use data from all electrodes, many of them don’t actually contain meaningful information regarding the stimulus, and thus could just as well hinder the overall performance of the decoding.

      (5) Beyond the current analyses, the authors may consider analyzing inter-subject correlation, especially for the gaze signal analysis. Given that the area of interest during the lesson changes dynamically, the teacher might not always be the focal point. Therefore, the correlation of gaze locations between subjects might be better than the percentage of gaze duration on the teacher.

      Thanks for this suggestion. We have tried to look into this, however working with eye-gaze in a 3-D space is extremely complex and we are not able to calculate reliable correlations between participants.

      (6) Some preprocessing steps relied on visual and subjective inspection. For instance, " Visual inspection was performed to identify and remove gross artifacts (excluding eye movements) " (P9); " The raw data was downsampled to 16Hz and inspected for any noticeable artifacts " (P13). Please consider using objective processes or provide standards for subjective inspections.

      We are aware of the possible differences between objective methods of artifact rejection vs. use of manual visual inspection, however we still prefer the manual (subjective) approach. As noted, in this case only very large artifacts were removed, exceeding ~ 4 SD of the amplitude variability, so as to preserve as many full-length trials as possible.

      (7) Numerous significance testing methods were employed in the manuscript. While I appreciate the detailed information provided, describing these methods in a separate section within the Methods would be more general and clearer. Additionally, the authors may consider using a linear mixed-effects model, which is more widely adopted in current neuroscience studies and can account for random subject effects.

      Indeed, there are many statistical tests in the paper, given the diverse types of neurophysiological data collected here. We actually thought that describing the statistics per method rather than in a separate “general” section would be easier to follow, but we understand that readers might diverge in their preferences.

      Regarding the use of mixed-effect models – this is indeed a great approach. However, it requires deriving reliable metrics on a per-trial basis, and while this might be plausible for some of our metrics, the EEG and GSR metrics are less reliable at this level. This is why we ultimately chose to aggregate across trials and use a regular regression model rather than mixed-effects.

      (8) Some participant information is missing, such as their academic majors. Given that only two lesson topics were used, the participants' majors may be a relevant factor.

      To clarify – the mini-lectures presented here actually covered a large variety of topics, broadly falling within the domains of history, science and social-science and technology. Regarding participants’ academic majors, these were relatively diverse, as can be seen in Author response table 1 and Author response image 4.

      Author response table 1.

      Author response image 4.

      (9) Did the multiple regression model include cross-validation? Please provide details regarding this.

      Yes, we used a leave-one-out cross validation procedure. We have now clarified this in the methods section which now reads:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Minor:

      (10) Typographical errors: P5, "forty-nine 49 participants"; P21, "$ref"; P26, "Table X"; P4, please provide the full name for "SC" when first mentioned.

      Thanks! corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hippocampal place cells display a sequence of firing activities when the animal travels through a spatial trajectory at a behavioral time scale of seconds to tens of seconds. Interestingly, parts of the firing sequence also occur at a much shorter time scale: ~120 ms within individual cycles of theta oscillation. These so-called theta sequences are originally thought to naturally result from the phenomenon of theta phase precession. However, there is evidence that theta sequences do not always occur even when theta phase precession is present, for example, during the early experience of a novel maze. The question is then how they emerge with experience (theta sequence development). This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, may play a key role in theta sequence development.

      The authors analyzed place cells, LFPs, and theta sequences as rats traveled a circular maze in repeated laps. They found that a group of place cells were significantly tuned to a particular phase of fast-gamma (FG-cells), in contrast to others that did not show such tunning (NFG-cells). The authors then omitted FG-cells or the same number of NFG-cells, in their algorithm of theta sequence detection and found that the quality of theta sequences, quantified by a weighted correlation, was worse with the FG-cell omission, compared to that with the NFG-cell omission, during later laps, but not during early laps. What made the FG-cells special for theta sequences? The authors found that FG-cells, but not NFG-cells, displayed phase recession to slow-gamma (25 - 45 Hz) oscillations (within theta cycles) during early laps (both FG- and NFG-cells showed slow-gamma phase precession during later laps). Overall, the authors conclude that FG-cells contribute to theta sequence development through slow-gamma phase precession during early laps.

      How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The identification of FG-cells in this study is straightforward. Evidence is also presented for the role of these cells in theta sequence development. However, given several concerns elaborated below, whether the evidence is sufficiently strong for the conclusion needs further clarification, perhaps, in future studies.

      We thank the reviewer for these positive comments.

      (1) The results in Figure 3 and Figure 8 seems contradictory. In Figure 8, all theta sequences displayed a seemingly significant weighted correlation (above 0) even in early laps, which was mostly due to FG-cell sequences but not NFG-cell sequences (correlation for NFG-sequences appeared below 0). However, in Figure 3H, omitting FG-cells and omitting NFG-cells did not produce significant differences in the correlation. Conversely, FG-cell and NFG-cell sequences were similar in later laps in Figure 8 (NFG-cell sequences appeared even better than FG-cell sequences), yet omitting NFG-cells produced a better correlation than omitting FG-cells. This confusion may be related to how "FG-cell-dominant sequences" were defined, which is unclear in the manuscript. Nevertheless, the different results are not easy to understand.

      We thank the reviewer for pointing out this important problem.  The potential contradictory can be interpreted by different sequence dataset included in Fig3 and Fig8, described as follows.

      (1) In Fig 3, all sequences decoded without either FG or NFG cells were included, defined as exFG-sequences and exNFG sequences, so that we couldn’t observe sequence development at early phase and thus the weighted correlation was low.  (2) In Fig8, however, the sequences with either FG or NFG cells firing across at least 3 slow gamma cycles were included, defined as FG-cell sequences and NFG-cell sequences.  This criterion ensures to investigate the relationship between sequence development and slow gamma phase precession, so that these sequences were contributed by cells likely to show slow gamma phase precession.  These definitions have been updated to the “Theta sequences detection” section of the Methods (Line 606-619).

      At early phase, there’s still no difference of weighted correlation between FG-cell sequences and NFG-cell sequences (Author response image 1A, Student’s t test, t(65)=0.2, p=0.8, Cohen's D=0.1), but the FG-cell sequences contained high proportion of slow gamma phase precession (Fig8F).  At late phase, both FG-cell sequences and NFG-cell sequences exhibited slow gamma phase precession, so that their weighted correlation were high with no difference (Author response image 1B, Student’s t test, t(62)=-1.1, p=0.3, Cohen's D=0.3).  This result further indicates that the theta sequence development requires slow gamma phase precession, especially for FG cells during early phase.

      Author response image 1.

      (2) The different contributions between FG-cells and NFG-cells to theta sequences are supposed not to be caused by their different firing properties (Figure 5). However, Figure 5D and E showed a large effect size (Cohen's D = 07, 0.8), although not significant (P = 0.09, 0.06). But the seemingly non-significant P values could be simply due to smaller N's (~20). In other parts of the manuscript, the effect sizes were comparable or even smaller (e.g. D = 0.5 in Figure 7B), but interpreted as positive results: P values were significant with large N's (~480 in Fig. 7B). Drawing a conclusion purely based on a P value while N is large often renders the conclusion only statistical, with unclear physical meaning. Although this is common in neuroscience publications, it makes more sense to at least make multiple inferences using similar sample sizes in the same study.

      We thank the reviewer for this kind suggestion.  We made multiple inferences using similar sample sizes as much as possible.  In Fig7B, we did the statistical analysis with sessions as samples, and we found the significant conclusion was maintained.  These results have been updated to the revised manuscript (Lines 269-270).and the Fig7B has been replaced correspondingly.

      (3) In supplementary Figure 2 - S2, FG-cells displayed stronger theta phase precession than NFG-cells, which could be a major reason why FG-cells impacted theta sequences more than NFG cells. Although factors other than theta phase precession may contribute to or interfere with theta sequences, stronger theta phase precession itself (without the interference of other factors), by definition, can lead to stronger theta sequences.

      This is a very good point.  The finding that FG-cells displayed stronger theta phase precession than NFG-cells was consistent with the finding of Guardamagna et al., 2023 Cell Rep, that the theta phase precession pattern emerged with strong fast gamma.  Since slow gamma phase precession occurred within theta cycles, it is hard to consider the contribution of these factors to theta sequences development, without taking theta phase precession into account.  But one should be noted that the theta sequences could not be developed even if theta phase precession existed from the very beginning of the exploration (Feng et al., 2025 J Neurosci).  These findings suggest that theta phase precession, together with other factors, impact theta sequence development.  However, the weight of each factor and their interaction still need to be further investigated.  We have discussed this possibility in the Discussion section (Lines 361- 373).

      (4) The slow-gamma phase precession of FG-cells during early laps is supposed to mediate or contribute to the emergence of theta sequences during late laps (Figure 1). The logic of this model is unclear. The slow-gamma phase precession was present in both early and late laps for FG-cells, but only present in late laps for NFG-cells. It seems more straightforward to hypothesize that the difference in theta sequences between early and later laps is due to the difference in slow-gamma phase precession of NFG cells between early and late laps. Although this is not necessarily the case, the argument presented in the manuscript is not easy to follow.

      We thank the reviewer for pointing this out.  The slow gamma phase precession was first found in my previous publication (Zheng et al., 2016 Neuron), which indicates a temporally compressed manner for coding spatial information related to memory retrieval.  In this case, we would expect that slow gamma phase precession occurred in all cells during late laps, because spatial information was retrieved when rats have been familiar with the environment.  However, during early laps when novel information was just encoded, there would be balance between fast gamma and slow gamma modulation of cells for upcoming encoding-retrieval transition.  A possibility is that FG-cells support this balance by receiving modulation of both fast gamma and slow gamma, but with distinct phase-coding modes (fast gamma phase locking and slow gamma phase precession) in a temporally coordinated manner.  We have discussed this possibility in the Discussion section (Lines 415- 428).

      (5) There are several questions on the description of methods, which could be addressed to clarify or strengthen the conclusions.

      (i) Were the identified fast- and slow-gamma episodes mutually exclusive?

      Yes, the fast- and slow-gamma episodes are mutually exclusive. We have added descriptions in the “Detection of gamma episodes” section in the Methods part (Lines 538-550).

      (ii) Was the task novel when the data were acquired? How many days (from the 1st day of the task) were included in the analysis? When the development of the theta sequence was mentioned, did it mean the development in a novel environment, in a novel task, or purely in a sense of early laps (Lap 1, 2) on each day?

      We thank the reviewer for pointing this out.  The task was not novel to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, when the development of the theta sequence was mentioned, it meant a sense of early laps on each day.

      (iii) How were the animals' behavioral parameters equalized between early and later laps? For example, speed or head direction could potentially produce the differences in theta sequences.

      This is a very good point.  In terms of the effect of running speed on theta sequences, we quantified the running speeds during theta sequences across trials 1-5.  We found that the rats were running at stable running speed, which has been reported in Fig.3F.  In terms of the effect of head direction on theta sequences, we measured the angle difference between head direction and running direction.  We found that the angle difference for each lap was distributed around 0, with no significant difference across laps (Fig.S3, Watson-Williams multi-sample test, F(4,55)=0.2, p=0.9, partial η<sup>2</sup>= 0.01).  These results indicate that the differences in theta sequences across trials cannot be interpreted by the variability of behavioral parameters.  We have updated these results and corresponding methods in the revised manuscript (Lines 172-175, Lines 507-511, with a new Fig.S3).

      Reviewer #2 (Public Review):

      This manuscript addresses an important question that has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region? Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision-making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior, and the interaction between these two phenomena. They found that place cells with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods, and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.

      We thank the reviewer for this kind suggestion.  The description of experimental paradigm and recordings has been added to the beginning of the results section (Lines 114-119).

      • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50%. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyzing the phase locking in two discrete frequency bands, compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that is less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase-locked cells could be due to a low number of spikes entering the analysis.

      We thank the reviewer for this constructive suggestion.  A previous work also on Long-Evans rats showed that the proportion of slow gamma phase-locked cells during novelty exploration was ~20%, however it dropped to ~10% during familiar exploration (Fig.4E, Kitanishi et al., 2015 Neuron).  This suggests that the proportion of slow gamma phase-locked cells may decreased with familiarity of the environment, which supports our data.  In addition, we also calculated the pairwise phase consistency index in terms of the effect of spike counts on MVL.  We could observe that the tendency of PPC (Author response image 2A) and MVL (Author response image 2B) along frequency bands were consistent across different subsets of cells, suggesting that the determination of cell subsets by MVL metric was not biased by the low number of spikes.  These results further shed light to the contribution of slow gamma phase precession of place cells to theta sequence development.

      Author response image 2.

      • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.

      We thank the reviewer for pointing this out.  In the main manuscript, we used local LFPs as the cells were recorded from the same tetrode.  In addition, we selected an individual tetrode which located at stratum pyramidale and at the center of the drive bundle for each rat.  We detected a similar proportion of FG-cells by using LFPs on this tetrode, compared with that using local LFPs (Author response image 3A-B, Chi-squared test, χ<sup>2</sup>= 0.9, p=0.4, Cramer V=0.03).  We further found that the PPC measurement of FG- and NFG-cells were different at fast gamma band by using central LFPs (Author response image 3D), consistent with that by using local LFPs (Author response image 3C).  Therefore, these results suggest that the findings related to fast gamma was not due to the contribution of spike leakage in the local LFPs.  We have updated the description in the manuscript (Lines 553-557, 566-568).

      Author response image 3.

      • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).

      We thank the reviewer for this kind suggestion.  We have cited these references and discussed the possibility of bimodal phase-locking in the manuscript (Lines 430-433).

      • It would be useful to provide a more thorough characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.

      We thank the reviewer for this constructive suggestion.  In addition to the characterizations shown in Fig5, we also analyzed firing rate, anatomical location and theta modulation to compare the physiological properties of FG- and NFG-cells.

      In terms of the firing properties of both types of cells, we found that the mean firing rate of FG-cell was higher than NFG-cell (Fig. 5A, Student's t-test, t(22) = 2.1, p = 0.04, Cohen's D = 0.9), which was consistent with the previous study that the firing rate was higher during fast gamma than during slow gamma (Zheng et al., 2015 Hippocampus).  However, the spike counts of excluded FG- and NFG-cells for decoding were similar (Fig. 5B, Student's t-test, t(22) = 1.2, p = 0.3, Cohen's D = 0.5), suggesting that the differences found in theta sequences cannot be accounted for by different decoding quality related to spike counts.  In addition, we measured the burstiness based on the distribution of inter-spike-intervals, and we found that the bursting probability of spikes was not significantly different between FG and NFG cells (Author response image 4A, Student's t-test, t(22) = 0.6, p=0.5, Cohen's d=0.3).

      In terms of theta modulation of cells, we first compared the theta frequency related to the firing of FG and NFG cells.  We detected the instantaneous theta frequency at each spike timing of FG and NFG cells, and found that it was not significantly different between cell types (Author response image 4B, Student's t-test, t(22) = -0.5, p=0.6, Cohen's d=0.2).  In addition, we found the proportion of cells with significant theta phase precession was greater in FG-cells than in NFG-cells (Fig. S2E).  However, the slope and starting phase of theta phase precession was not significantly different between FG and NFG cells (Author response image 4C, Student's t-test, t(21) = 0.3, p=0.8, Cohen's d=0.1; Author response image 4D, Watson-Williams test, F(1,21)=0.5, p=0.5, partial η<sup>2</sup>=0.02).

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      Author response image 4.

      • It is not clear to me how the analysis in Figure 6 was performed. In Figure 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would be the interpretation of the results.

      We thank the reviewer for raising this good point.  The grey line was just for intuitional observation, not a quantitative analysis.  We have removed the grey lines from all heat maps in Fig.6.

      Reviewer #3 (Public Review):

      [Editors' note: This review contains many criticisms that apply to the whole sub-field of slow/fast gamma oscillations in the hippocampus, as opposed to this particular paper. In the editors' view, these comments are beyond the scope of any single paper. However, they represent a view that, if true, should contextualise the interpretation of this paper and all papers in the sub-field. In doing so, they highlight an ongoing debate within the broader field.]

      Summary:

      The authors aimed to elucidate the role of dynamic gamma modulation in the development of hippocampal theta sequences, utilizing the traditional framework of "two gammas," a slow and a fast rhythm. This framework is currently being challenged, necessitating further analyses to establish and secure the assumed premises before substantiating the claims made in the present article.

      The results are too preliminary and need to integrate contemporary literature. New analyses are required to address these concerns. However, by addressing these issues, it may be possible to produce an impactful manuscript.

      We thank the reviewer for raising these important questions in the hippocampal gamma field.  We have done a lot of new analyses according to the comments to strengthen our manuscript.

      I. Introduction

      Within the introduction, multiple broad assertions are conveyed that serve as the premise for the research. However, equally important citations that are not mentioned potentially contradict the ideas that serve as the foundation. Instances of these are described below:

      (1) Are there multiple gammas? The authors launched the study on the premise that two different gamma bands are communicated from CA3 and the entorhinal cortex. However, recent literature suggests otherwise, offering that the slow gamma component may be related to theta harmonics:

      From a review by Etter, Carmichael and Williams (2023)

      "Gamma-based coherence has been a prominent model for communication across the hippocampal-entorhinal circuit and has classically focused on slow and fast gamma oscillations originating in CA3 and medial entorhinal cortex, respectively. These two distinct gammas are then hypothesized to be integrated into hippocampal CA1 with theta oscillations on a cycle-to-cycle basis (Colgin et al., 2009; Schomburg et al., 2014). This would suggest that theta oscillations in CA1 could serve to partition temporal windows that enable the integration of inputs from these upstream regions using alternating gamma waves (Vinck et al., 2023). However, these models have largely been based on correlations between shifting CA3 and medial entorhinal cortex to CA1 coherence in theta and gamma bands. In vivo, excitatory inputs from the entorhinal cortex to the dentate gyrus are most coherent in the theta band, while gamma oscillations would be generated locally from presumed local inhibitory inputs (Pernía-Andrade and Jonas, 2014). This predominance of theta over gamma coherence has also been reported between hippocampal CA1 and the medial entorhinal cortex (Zhou et al., 2022). Another potential pitfall in the communication-through-coherence hypothesis is that theta oscillations harmonics could overlap with higher frequency bands (Czurkó et al., 1999; Terrazas et al., 2005), including slow gamma (Petersen and Buzsáki, 2020). The asymmetry of theta oscillations (Belluscio et al., 2012) can lead to harmonics that extend into the slow gamma range (Scheffer-Teixeira and Tort, 2016), which may lead to a misattribution as to the origin of slow-gamma coherence and the degree of spike modulation in the gamma range during movement (Zhou et al., 2019)."

      And from Benjamin Griffiths and Ole Jensen (2023)

      "That said, in both rodent and human studies, measurements of 'slow' gamma oscillations may be susceptible to distortion by theta harmonics [53], meaning open questions remain about what can be attributed to 'slow' gamma oscillations and what is attributable to theta."

      This second statement should be heavily considered as it is from one of the original authors who reported the existence of slow gamma.

      Yet another instance from Schomburg, Fernández-Ruiz, Mizuseki, Berényi, Anastassiou, Christof Koch, and Buzsáki (2014):

      "Note that modulation from 20-30 Hz may not be related to gamma activity but, instead, reflect timing relationships with non-sinusoidal features of theta waves (Belluscio et al., 2012) and/or the 3rd theta harmonic."

      One of this manuscript's authors is Fernández-Ruiz, a contemporary proponent of the multiple gamma theory. Thus, the modulation to slow gamma offered in the present manuscript may actually be related to theta harmonics.

      With the above emphasis from proponents of the slow/fast gamma theory on disambiguating harmonics from slow gamma, our first suggestion to the authors is that they A) address these statements (citing the work of these authors in their manuscript) and B) demonstrably quantify theta harmonics in relation to slow gamma prior to making assertions of phase relationships (methodological suggestions below). As the frequency of theta harmonics can extend as high as 56 Hz (PMID: 32297752), overlapping with the slow gamma range defined here (25-45 Hz), it will be important to establish an approach that decouples the two phenomena using an approach other than an arbitrary frequency boundary.

      We agree with the reviewer that the theta oscillations harmonics could overlap with higher frequency bands including slow gamma, as the above reviews discussed.  In order to rule out the possibility of theta harmonics effects in this study, we added new analyses in this letter (see below).

      (2) Can gammas be segregated into different lamina of the hippocampus? This idea appears to be foundational in the premise of the research but is also undergoing revision.

      As discussed by Etter et al. above, the initial theory of gamma routing was launched on coherence values. However, the values reported by Colgin et al. (2009) lean more towards incoherence (a value of 0) rather than coherence (1), suggesting a weak to negligible interaction. Nevertheless, this theory is coupled with the idea that the different gamma frequencies are exclusive to the specific lamina of the hippocampus.

      Recently, Deschamps et al. (2024) suggested a broader, more nuanced understanding of gamma oscillations than previously thought, emphasizing their wide range and variability across hippocampal layers. This perspective challenges the traditional dichotomy of gamma sub-bands (e.g., slow vs. medium gamma) and their associated cognitive functions based on a more rigid classification according to frequency and phase relative to the theta rhythm. Moreover, they observed all frequencies across all layers.

      Similarly, the current source density plots from Belluscio et al. (2012) suggest that SG and FG can be observed in both the radiatum and lacunosum-moleculare.

      Therefore, if the initial coherence values are weak to negligible and both slow and fast gamma are observed in all layers of the hippocampus, can the different gammas be exclusively related to either anatomical inputs or psychological functions (as done in the present manuscript)? Do these observations challenge the authors' premise of their research? At the least, please discuss.

      We thank the reviewer for raising this point, which I believe still remains controversial in this field.  We also thank the reviewer for providing detailed proofs of existence forms of gamma rhythms.  The reviewer was considering 2 aspects of gamma: 1) the reasonability of dividing slow and fast gamma by specific frequency bands; 2) the existence of gamma across all hippocampal layers, which challenged the functional significance of different types of gamma rhythms.  Although the results in Douchamps et al., 2024 challenged the idea of rigid gamma sub-bands, we still could see separate slow and fast gamma components exclusively occurred along time course, with central frequency of slow gamma lower than ~60Hz and central frequency of fast gamma higher than ~60Hz (Fig.1b of Douchamps et al., 2024).  This was also seen in the rat dataset of this reference (Fig. S3).  Since their behavioral test required both memory encoding and retrieval processes, it was hard to distinguish the role of different gamma components as they may dynamically coordinate during complex memory process.  Thus, although the behavioral performance can be decoded from broad range of gamma, we still cannot deny the existence of difference gamma rhythms and their functional significance during difference memory phases.

      (3) Do place cells, phase precession, and theta sequences require input from afferent regions? It is offered in the introduction that "Fast gamma (~65-100Hz), associated with the input from the medial entorhinal cortex, is thought to rapidly encode ongoing novel information in the context (Fernandez-Ruiz et al., 2021; Kemere, Carr, Karlsson, & Frank, 2013; Zheng et al., 2016)".

      CA1 place fields remain fairly intact following MEC inactivation include Ipshita Zutshi, Manuel Valero, Antonio Fernández-Ruiz , and György Buzsáki (2022)- "CA1 place cells and assemblies persist despite combined mEC and CA3 silencing" and from Hadas E Sloin, Lidor Spivak, Amir Levi, Roni Gattegno, Shirly Someck, Eran Stark (2024) - "These findings are incompatible with precession models based on inheritance, dual-input, spreading activation, inhibition-excitation summation, or somato-dendritic competition. Thus, a precession generator resides locally within CA1."

      These publications, at the least, challenge the inheritance model by which the afferent input controls CA1 place field spike timing. The research premise offered by the authors is couched in the logic of inheritance, when the effect that the authors are observing could be governed by local intrinsic activity (e.g., phase precession and gamma are locally generated, and the attribution to routed input is perhaps erroneous). Certainly, it is worth discussing these manuscripts in the context of the present manuscript.

      We thank the review for this discussion.  The main purpose of our current study is to investigate the mechanism of theta sequence development along with learning, which may or may not dependent on theta phase precession of single place cells as it remains controversial in this field.  Also, there is a limitation in this study that all gamma components were recorded from stratum pyramidale, thus we cannot make any conclusion on the originate of gamma in modulating sequence development.

      II. Results

      (1) Figure 2-

      a. There is a bit of a puzzle here that should be discussed. If slow and fast frequencies modulate 25% of neurons, how can these rhythms serve as mechanisms of communication/support psychological functions? For instance, if fast gamma is engaged in rapid encoding (line 72) and slow gamma is related to the integration processing of learned information (line 84), and these are functions of the hippocampus, then why do these rhythms modulate so few cells? Is this to say 75% of CA1 neurons do not listen to CA3 or MEC input?

      The proportion ~25% was the part of place cells phase-locked to either slow or fast gamma.  However, one of the main findings in this study was that most cells were modulated by slow gamma as they fired at precessed slow gamma phase within a theta cycle (Figs 6-8), which would promote information compression for theta sequence development.  Therefore, we didn’t mean that only a small proportion of cells were modulated by gamma rhythms and contributed to this process.

      b. Figure 2. It is hard to know if the mean vector lengths presented are large or small. Moreover, one can expect to find significance due to chance. For instance, it is challenging to find a frequency in which modulation strength is zero (please see Figure 4 of PMID: 30428340 or Figure 7 of PMID: 31324673).

      i. Please construct the histograms of Mean Vector Length as in the above papers, using 1 Hz filter steps from 1-120Hz and include it as part of Figure 2 (i.e., calculate the mean vector length for the filtered LFP in steps of 1-2 Hz, 2-3 Hz, 3-4 Hz,... etc). This should help the authors portray the amount of modulation these neurons have relative to the theta rhythm and other frequencies. If the theta mean vector length is higher, should it be considered the primary modulatory influence of these neurons (with slow and fast gammas as a minor influence)?

      We thank the review for this suggestion.  We measured the mean vector length at 5Hz step (equivalent to 1Hz step), and we found that the FG-cells were phase-locked to fast gamma rhythms even stronger than that to theta (Author response image 2B, mean MVL of theta=0.126±0.007, mean MVL of theta=0.175±0.006, paired t-test, t(112)=-5.9, p=0.01, Cohen's d=0.7).  In addition, in some previous studies with significant fast gamma phase locking, the MVL values were around 0.15 by using broad gamma band (Kitanishi et al., 2015 Neuron, Lasztóczi et al., 2016 Neuron, Tomar et al., 2021 Front Behav Neurosci, and Asiminas et al., 2022 Molecular Autism), which was consistent with the value in this study.  Therefore, we don’t believe that fast gamma was only a minor influence of these neurons.

      ii. It is possible to infer a neuron's degree of oscillatory modulation without using the LFP. For instance, one can create an ISI histogram as done in Figure 1 here (https://www.biorxiv.org/content/10.1101/2021.09.20.461152v3.full.pdf+html; "Distinct ground state and activated state modes of firing in forebrain neurons"). The reciprocal of the ISI values would be "instantaneous spike frequency". In favor of the Douchamps et al. (2024) results, the figure of the BioRXiV paper implies that there is a single gamma frequency modulate as there is only a single bump in the ISIs in the 10^-1.5 to 10^-2 range. Therefore, to vet the slow gamma results and the premise of two gammas offered in the introduction, it would be worth including this analysis as part of Figure 2.

      By using suggested method, we calculated the ISI distribution on log scale for FG-cells and NFG-cells during behavior (Author response image 5).  We could observe that the ISI distribution of FG-cells had a bump in the 10<sup>-1.5</sup>= to 10<sup>-2</sup>= range (black bar), in particular in the fast gamma range (10<sup>-2</sup>= to 10<sup>-1.8</sup>=).

      Author response image 5.

      c. There are some things generally concerning about Figure 2.

      i. First, the raw trace does not seem to have clear theta epochs (it is challenging to ascertain the start and end of a theta cycle). Certainly, it would be worth highlighting the relationship between theta and the gammas and picking a nice theta epoch.

      We thank the review for this suggestion.  We've updated this figure with a nice theta epoch in the revised manuscript.

      ii. Also, in panel A, there looks to be a declining amplitude relationship between the raw, fast, and slow gamma traces, assuming that the scale bars represent 100uV in all three traces. The raw trace is significantly larger than the fast gamma. However, this relationship does not seem to be the case in panel B (in which both the raw and unfiltered examples of slow and fast gamma appear to be equal; the right panels of B suggest that fast gamma is larger than slow, appearing to contradict the A= 1/f organization of the power spectral density). Please explain as to why this occurs. Including the power spectral density (see below) should resolve some of this.

      We thank the review for pointing this out.  The scales of y-axis of LFPs tracs in Fig.2B was not consistent, which mislead the comparison of amplitude between slow and fast gamma.  We have unified y axis scales across different gamma types in the revised manuscript.  Moreover, we also have replaced these examples with more typical ones (also see the response below).

      iii. Within the example of spiking to phase in the left side of Panel B (fast gamma example)- the neuron appears to fire near the trough twice, near the peak twice, and somewhere in between once. A similar relationship is observed for the slow gamma epoch. One would conclude from these plots that the interaction of the neuron with the two rhythms is the same. However, the mean vector lengths and histograms below these plots suggest a different story in which the neuron is modulated by FG but not SG. Please reconcile this.

      We thank the review for pointing this out.  We found that the fast gamma phase locking was robust across FG-cells with fast gamma peak as the preferred phase.  Therefore, we have replaced these examples with more typical ones, so that the examples were consistent with the group effect.

      iv. For calculating the MVL, it seems that the number of spikes that the neuron fires would play a significant role. Working towards our next point, there may be a bias of finding a relationship if there are too few spikes (spurious clustering due to sparse data) and/or higher coupling values for higher firing rate cells (cells with higher firing rates will clearly show a relationship), forming a sort of inverse Yerkes-Dodson curve. Also, without understanding the magnitude of the MVL relative to other frequencies, it may be that these values are indeed larger than zero, but not biologically significant.

      - Please provide a scatter plot of Neuron MVL versus the Neuron's Firing Rate for 1) theta (7-9 Hz), 2) slow gamma, and 3) fast gamma, along with their line of best fit.

      - Please run a shuffle control where the LFP trace is shifted by random values between 125-1000ms and recalculate the MVL for theta, slow, and fast gamma. Often, these shuffle controls are done between 100-1000 times (see cross-correlation analyses of Fujisawa, Buzsaki et al.).

      - To establish that firing rate does not play a role in uncovering modulation, it would be worth conducting a spike number control, reducing the number of spikes per cell so that they are all equal before calculating the phase plots/MVL.

      We thank the review for raising this point.  Beside of the MVL value, we also calculated the pairwise phase consistency (PPC) as suggested by Reviewer2, which is not sensitive to the spike counts.  We found that the phase locking strength to either rhythm (theta or gamma) was comparable between MVL and PPC measurements (Author response image 2).  Moreover, we quantified the relationship between MVL and mean firing rate, as suggested.  We found that the MVL value for theta, slow gamma and fast gamma was negatively correlated with mean firing rate (Author response image 6, Pearson correlation, theta: R<sup>2</sup>= 0.06, Pearson’s r=-0.3, p=1.3×10<sup>-8</sup>=; slow gamma: R<sup>2</sup>= 0.1, Pearson’s r=-0.4, p=2.4×10<sup>-17</sup>=; fast gamma: R<sup>2</sup>= 0.03, Pearson’s r=-0.2, p=4.3×10<sup>-5</sup>=).  These results help us rule out the concerns of the effect of spikes counts on the phase modulation measurement.

      Author response image 6.

      (2) Something that I anticipated to see addressed in the manuscript was the study from Grosmark and Buzsaki (2016): "Cell assembly sequences during learning are "replayed" during hippocampal ripples and contribute to the consolidation of episodic memories. However, neuronal sequences may also reflect preexisting dynamics. We report that sequences of place-cell firing in a novel environment are formed from a combination of the contributions of a rigid, predominantly fast-firing subset of pyramidal neurons with low spatial specificity and limited change across sleep-experience-sleep and a slow-firing plastic subset. Slow-firing cells, rather than fast-firing cells, gained high place specificity during exploration, elevated their association with ripples, and showed increased bursting and temporal coactivation during postexperience sleep. Thus, slow- and fast-firing neurons, although forming a continuous distribution, have different coding and plastic properties."

      My concern is that much of the reported results in the present manuscript appear to recapitulate the observations of Grosmark and Buzsaki, but without accounting for differences in firing rate. A parsimonious alternative explanation for what is observed in the present manuscript is that high firing rate neurons, more integrated into the local network and orchestrating local gamma activity (PING), exhibit more coupling to theta and gamma. In this alternative perspective, it's not something special about how the neurons are entrained to the routed fast gamma, but that the higher firing rate neurons are better able to engage and entrain their local interneurons and, thus modulate local gamma. However, this interpretation challenges the discussion around the importance of fast gamma routed from the MEC.

      a. Please integrate the Grosmark & Buzsaki paper into the discussion.

      b. Also, please provide data that refutes or supports the alternative hypothesis in which the high firing rate cells are just more gamma modulated as they orchestrate local gamma activity through monosynaptic connections with local interneurons (e.g., Marshall et al., 2002, Hippocampal pyramidal cell-interneuron spike transmission is frequency dependent and responsible for place modulation of interneuron discharge). Otherwise, the attribution to a MEC routed fast gamma routing seems tenuous.

      c. It is mentioned that fast-spiking interneurons were removed from the analysis. It would be worth including these cells, calculating the MVL in 1 Hz increments as well as the reciprocal of their ISIs (described above).

      We thank the review for this suggestion.  Because we found the mean firing rate of FG-cells was higher than that of NFG-cells, it would be possible that the FG-cells are mainly overlapped with fast-firing cells (rigid cells) in Grosmark et al., 2016 Science.  Actually, in this study, we aimed to investigate how fast and slow gamma rhythms modulated neurons dynamically during learning, rather than defining new cell types.  Thus, we don’t think this work was just a replication of the previous publication.  We have added this description in the Discussion part (Lines 439-441).  In addition, we don’t have enough number of interneurons to support the analysis between interneurons and place cells.  Therefore, we couldn’t make any statement about where was the fast gamma originated (CA1 locally or routed from MEC) in this study.

      (3) Methods - Spectral decomposition and Theta Harmonics.

      a. It is challenging to interpret the exact parameters that the authors used for their multi-taper analysis in the methods (lines 516-526). Tallon-Baudry et al., (1997; Oscillatory γ-Band (30-70 Hz) Activity Induced by a Visual Search Task in Humans) discuss a time-frequency trade-off where frequency resolution changes with different temporal windows of analysis. This trade-off between time and frequency resolution is well known as the uncertainty principle of signal analysis, transcending all decomposition methods. It is not only a function of wavelet or FFT, and multi-tapers do not directly address this. (The multitaper method, by using multiple specially designed tapers -like the Slepian sequences- smooths the spectrum. This smoothing doesn't eliminate leakage but distributes its impact across multiple estimates). Given the brevity of methods and the issues of theta harmonics as offered above, it is worth including some benchmark trace testing for the multi-taper as part of the supplemental figures.

      i. Please spectrally decompose an asymmetric 8 Hz sawtooth wave showing the trace and the related power spectral density using the multiple taper method discussed in the methods.

      ii. Please also do the same for an elliptical oscillation (perfectly symmetrical waves, but also capable of casting harmonics). Matlab code on how to generate this time series is provided below:

      A = 1; % Amplitude

      T = 1/8; % Period corresponding to 8 Hz frequency

      omega = 2*pi/T; % Angular frequency

      C = 1; % Wave speed

      m = 0.9; % Modulus for the elliptic function (0<m<1 for cnoidal waves)

      x = linspace(0, 2*pi, 1000); % temporal domain

      t = 0; % Time instant

      % Calculate B based on frequency and speed

      B = sqrt(omega/C);

      % Cnoidal wave equation using the Jacobi elliptic function

      u = A .* ellipj(B.*(x - C*t), m).^2;

      % Plotting the cnoidal wave

      figure;

      plot(x./max(x), u);

      title('8 Hz Cnoidal Wave');

      xlabel('time (x)');

      ylabel('Wave amplitude (u)');

      grid on;

      The Symbolic Math Toolbox needs to be installed and accessible in your MATLAB environment to use ellipj. Otherwise, I trust that, rather than plotting a periodic orbit around a circle (sin wave) the authors can trace the movement around an ellipse with significant eccentricity (the distance between the two foci should be twice the distance between the co-vertices).

      We thank the review for this suggestion.  In the main text of manuscript, we only applied Morlet's wavelet method to calculate the time varying power of rhythms.  Multitaper method was used for the estimation of power spectra across running speeds, which was shown in the manuscript.  Therefore, we removed the description of Multitaper method and updated the Morlet's wavelet power spectral analysis in the Methods (Lines 541-544).

      As suggested, we estimated the power spectral densities of 8 Hz sawtooth and elliptical oscillation by using these methods, and compared them with the results from FFT.  We found that both the Multitaper's and Morlet's wavelet methods could well capture the 8Hz oscillatory components (Author response image 7).  However, we could observe harmonic components from FFT spectrum.

      Author response image 7.

      iii. Line 522: "The power spectra across running speeds and absolute power spectrum (both results were not shown).". Given the potential complications of multi-taper discussed above, and as each convolution further removes one from the raw data, it would be the most transparent, simple, and straightforward to provide power spectra using the simple fft.m code in Matlab (We imagine that the authors will agree that the results should be robust against different spectral decomposition methods. Otherwise, it is concerning that the results depend on the algorithm implemented and should be discussed. If gamma transience is a concern, the authors should trigger to 2-second epochs in which slow/fast gamma exceeds 3-7 std. dev. above the mean, comparing those resulting power spectra to 2-second epochs with ripples - also a transient event). The time series should be at least 2 seconds in length (to avoid spectral leakage issues and the issues discussed in Talon-Baudry et al., 1997 above).

      Please show the unmolested power spectra (Y-axis units in mV2/Hz, X-axis units as Hz) as a function of running speed (increments of 5 cm/s) for each animal. I imagine three of these PSDs for 3 of the animals will appear in supplemental methods while one will serve as a nice manuscript figure. With this plot, please highlight the regions that the authors are describing as theta, slow, and fast gamma. Also, any issues should be addressed should there be notable differences in power across animals or tetrodes (issues with locations along proximal-distal CA1 in terms of MEC/LEC input and using a local reference electrode are discussed below).

      As suggested, we firstly estimated the power spectra as a function of running speeds in each running lap, and showed them separately for each rat, by using the multitaper spectral analysis (Author response image 8).  In addition, to achieve unmolested power spectra, the short-time Fourier transform (STFT) was used for this analysis at the same frequency resolution (Author response image 9).  We could see that the power spectra were consistent between these two methods.  Notably, there seems no significant theta harmonic component in the slow gamma band range.

      The multitaper spectral analysis was performed as follows.  The power spectra were measured across different running speeds as described previously (Ahmed et al., 2012 J Neurosci; Zheng et al., 2015 Hippocampus; Zheng et al., 2016 eNeuro).  Briefly, the absolute power spectrum was calculated for 0.5s moving window and 0.2s step size of the LFPs recordings each lap, using the multitaper spectral analysis in the Chronux toolbox (Mitra and Bokil, 2008, http://chronux.org/) and STFT spectral analysis in Matlab script stft.m.  In the multitaper method, the time-bandwidth product parameter (TW) was set at 3, and the number of tapers (K) was set at 5.  In the STFT method, the FFT length was set at 2048, which was equivalent with the parameters used in multitaper method.  Running speed was calculated (see “Estimation of running speed and head direction” section in the manuscript) and averaged within each 0.5s time window corresponding to the LFP segments.  Then, the absolute power at each frequency was smoothed with a Gaussian kernel centered on given speed bin.  The power spectral as a function of running speed and frequency were plotted in log scale.  Also, the colormap was in log scale, allowing for comparisons across different frequencies that would otherwise be difficult due to the 1/f decay of power in physiological signals.

      Author response image 8.

      Author response image 9.

      iv. Schomberg and colleagues (2014) suggested that the modulation of neurons in the slow gamma range could be related to theta harmonics (see above). Harmonics can often extend in a near infinite as they regress into the 1/f background (contributing to power, but without a peak above the power spectral density slope), making arbitrary frequency limits inappropriate. Therefore, in order to support the analyses and assertions regarding slow gamma, it seems necessary to calculate a "theta harmonic/slow gamma ratio". Aru et al. (2015; Untangling cross-frequency coupling in neuroscience) offer that: " The presence of harmonics in the signal should be tested by a bicoherence analysis and its contribution to CFC should be discussed." Please test both the synthetic signals above and the raw LFP, using temporal windows of greater than 4 seconds (again, the large window optimizes for frequency resolution in the time-frequency trade-off) to calculate the bicoherence. As harmonics are integers of theta coupled to itself and slow gamma is also coupled to theta, a nice illustration and contribution to the field would be a method that uses the bispectrum to isolate and create a "slow gamma/harmonic" ratio.

      We thank the reviewer for providing the method regarding on the theta harmonics.  We firstly measured the theta harmonics on the synthesized signal by using the biphasic coherence method, and we could clearly observe the nonlinear coupling between theta rhythm and its harmonics (Author response image 10).

      Author response image 10.

      In addition, we also measured the bicoherence on raw traces during slow gamma episodes.  We did not see nonlinear coupling between slow gamma and theta bands in this real data (mean bicoherence=0.1±0.0002) compared with that in the synthesized signal (mean bicoherence=0.7 for elliptical waves and 0.5 for sawtooth waves), suggesting that the slow gamma detected in this study was not pure theta harmonic (Author response image 11C, F, I, in red boxes).  Therefore, we believe that the contribution of theta harmonic in slow gamma is not significant.

      Author response image 11.

      (4) I appreciate the inclusion of the histology for the 4 animals. Knerim and colleagues describe a difference in MEC projection along the proximal-distal axis of the CA1 region (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866456/)- "There are also differences in their direct projections along the transverse axis of CA1, as the LEC innervates the region of CA1 closer to the subiculum (distal CA1), whereas the MEC innervates the region of CA1 closer to CA2 and CA3 (proximal CA1)" From the histology, it looks like some of the electrodes are in the part of CA1 that would be dominated by LEC input while a few are closer to where the MEC would project.

      a. How do the authors control for these differences in projections? Wouldn't this change whether or not fast gamma is observed in CA1?

      b. I am only aware of one manuscript that describes slow gamma in the LEC which appeared in contrast to fast gamma from the MEC (https://www.science.org/doi/10.1126/science.abf3119). One would surmise that the authors in the present manuscript would have varying levels of fast gamma in their CA1 recordings depending on the location of the electrodes in the Proximal-distal axis, to the extent that some of the more medial tetrodes may need to be excluded (as they should not have fast gamma, rather they should be exclusively dominated by slow gamma). Alternatively, the authors may find that there is equal fast gamma power across the entire proximal-distal axis. However, this would pose a significant challenge to the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz et al. and require reconciliation/discussion.

      c. Is there a difference in neuron modulation to these frequencies based on electrode location in CA1?

      We thank the reviewer for this concern, which was also raised by Reviewer2.  We aligned the physical location of LFP channels in the proximal-distal axis based on histology.  In our dataset, only 2 rats were recorded from both distal and proximal hippocampus, so we calculated the gamma power from both sites in these rats.  We found that slow power was higher from proximal tetrodes than that from distal tetrodes (Author response image 12, repeated measure ANOVA, F(1,7)=10.2, p=0.02, partial η <sup>2</sup>=0.8).  However, fast gamma power were similar between different recording sites (F(1,7)=0.008, p=0.9, partial η <sup>2</sup>=0.001).  These results are partially consistent with the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz’s work.  The main reason would be that all LFPs were recorded from tetrodes in stratum pyramidale, deep layer in particular (Author response image 4E), so that it was hard to precisely identify their distance to distal/proximal apical dendrites.

      Author response image 12.

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      (5) Given a comment in the discussion (see below), it will be worth exploring changes in theta, theta harmonic, slow gamma, and fast gamma power with running speed as no changes were observed with theta sequences or lap number versus. Notably, Czurko et al., report an increase in theta and harmonic power with running speed (1999) while Ahmed and Mehta (2012) report a similar effect for gamma.

      a. Please determine if the oscillations change in power and frequency of the rhythms discussed above change with running speed using the same parameters applied in the present manuscript. The specific concern is that how the authors calculate running speed is not sensitive enough to evaluate changes.

      We thank the reviewer for this suggestion.  The description of running speed quantification has been updated in the Method (see “Estimation of running speed and head direction” section, Lines 501-511).  Overall, the sample frequency of running speed was25Hz which would be sensitive enough to evaluate the behavioral changes.

      By measuring the rhythmic power changing as a function of running speed (Author response image 8 and Author response image 9), we could observe that theta power was increased as running speed getting higher.  Consistent with the results in (Ahmed and Mehta, 2012) and our previous study (Zheng et al., 2015), the fast gamma power was increasing and slow gamma power was decreasing when running speed was getting high.

      In addition, we also estimated the rhythmic frequency as a function of running speed in the slow and fast episodes respectively.  We found that fast gamma frequency was increased with running speed (Author response image 13, linear regression, R<sup>2</sup>=0.4, corr=0.6, p=9.9×10<sup>-15</sup>), whereas slow gamma frequency was decreased with running speed (R<sup>2</sup>=0.2, corr=-0.4, p=8.8×10<sup>-6</sup>).  Although significant correlation was found between gamma frequency and running speed, consistent with the previous studies, the frequency change (~70-75Hz for fast gamma and ~30-28Hz for slow gamma) was not big enough to affect the sequence findings in this study.  In additiontheta frequency was maintained in either slow episodes (R<sup>2</sup>=0.02, corr=-0.1, p=0.1) or fast episodes (R<sup>2</sup>=0.004, corr=0.06, p=0.5), consistent with results in Fig.1G of Kropff et al., 2021 Neuron.

      Author response image 13.

      b. It is astounding that animals ran as fast as they did in what appears to be the first lap (Figure 3F), especially as rats' natural proclivity is thigmotaxis and inquisitive exploration in novel environments. Can the authors expand on why they believe their rats ran so quickly on the first lap in a novel environment and how to replicate this? Also, please include the individual values for each animal on the same plot.

      We thank the reviewer for pointing this out.  The task was not brand new to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, in terms exploration behaviors, the rats ran at relative high speeds across laps (Author response image 14, each gray line represents the running speed within an individual session).

      Author response image 14.

      c. Can the authors explain how the statistics on line 169 (F(4,44)) work? Specifically, it is challenging to determine how the degrees of freedom were calculated in this case and throughout if there were only 4 animals (reported in methods) over 5 laps (depicted in Figure 3F. Given line 439, it looks like trials and laps are used synonymously). Four animals over 5 laps should have a DOF of 16.

      This statistic result was performed with each session/day as a sample (n=12 sessions/days).  The statistics were generated by repeated measures ANOVA on 5 trials in 12 sessions, with a DOF of 44.

      (6) Throughout the manuscript, I am concerned about an inflation of statistical power. For example on line 162, F(2,4844). The large degrees of freedom indicate that the sample size was theta sequences or a number of cells. Since multiple observations were obtained from the same animal, the statistical assumption of independence is violated. Therefore, the stats need to be conducted using a nested model as described in Aarts et al. (2014; https://pubmed.ncbi.nlm.nih.gov/24671065/). A statistical consult may be warranted.

      We thank the reviewer for this suggestion.  We have replaced this statistic result by using generalized linear mixed model with ratID being a covariate.  These results have been updated in the revised manuscript (Lines 164-167).

      (7) It is stated that one tetrode served as a quiet recording reference. The "quiet" part is an assumption when often, theta and gamma can be volume conducted to the cortex (e.g., Sirota et al., 2008; This is often why laboratories that study hippocampal rhythms use the cerebellum for the differential recording electrode and not an electrode in the corpus callosum). Generally, high frequencies propagate as well as low frequencies in the extracellular milieu (https://www.eneuro.org/content/4/1/ENEURO.0291-16.2016). For transparency, the authors should include a limitation paragraph in their discussion that describes how their local tetrode reference may be inadvertently diminishing and/or distorting the signal that they are trying to isolate. Otherwise, it would be worth hearing an explanation as to how the author's approach avoids this issue.

      In terms of the locations of references, we had 2 screws above the cerebellum in the skull connected to the recording drive ground, and 1 tetrode in a quiet area of the cortex serving as the recording reference.  We agree that the theta and gamma can be volume conducted to the cortex which may affect the power of these rhythms in the stratum pyramidale.  However, we didn’t mean to measure or compare the absolute theta or gamma power in this study, as we only cared about the phase modulation of gamma to place cells.  Therefore, we believe the location of recording reference would not make significant effect on our conclusion.

      Apologetically, this review is already getting long. Moreover, I have substantial concerns that should be resolved prior to delving into the remainder of the analyses. e.g., the analyses related to Figure 3-5 assert that FG cells are important for sequences. However, the relationship to gamma may be secondary to either their relationship to theta or, based on the Grosmark and Buzsaki paper, it may just be a phenomenon coupled to the fast-firing cells (fast-firing cells showing higher gamma modulation due to a local PING dynamic). Moreover, the observation of slow gamma is being challenged as theta harmonics, even by the major proponents of the slow/fast gamma theory. Therefore, the report of slow gamma precession would come as an unsurprising extension should they be revealed to be theta harmonics (however, no control for harmonics was implemented; suggestions were made above). Following these amendments, I would be grateful for the opportunity to provide further feedback.

      III. Discussion.

      a. Line 330- it was offered that fast gamma encodes information while slow gamma integrates in the introduction. However, in a task such as circular track running (from the methods, it appears that there is no new information to be acquired within a trial), one would guess that after the first few laps, slow gamma would be the dominant rhythm. Therefore, one must wonder why there are so few neurons modulated by slow gamma (~3.7%).

      The proportion of ~3.7% was the part of place cells phase-locked to slow gamma.  However, we aimed to find that the slow gamma phase precession of place cells promoted the theta sequence development.  We would not expect the cells phase-locked to slow gamma if phase precession occurred.

      b. Line 375: The authors contend that: "...slow gamma, related to information compression, was also required to modulate fast gamma phase-locked cells during sequence development. We replicated the results of slow gamma phase precession at the ensemble level (Zheng et al., 2016), and furthermore observed it at late development, but not early development, of theta sequences." In relation to the idea that slow gamma may be coupled to - if not a distorted representation of - theta harmonics, it has been observed that there are changes in theta relative to novelty.

      i. A. Jeewajee, C. Lever, S. Burton, J. O'Keefe, and N. Burgess (2008) report a decrease in theta frequency in novel circumstances that disappears with increasing familiarity.

      ii. One could surmise that this change in frequency is associated with alterations in theta harmonics (observed here as slow gamma), challenging the author's interpretation.

      iii. Therefore, the authors have a compelling opportunity to replicate the results of Jeewajee et al., characterizing changes of theta along with the development of slow gamma precession, as the environment becomes familiar. It will become important to demonstrate, using bicoherence as offered by Aru et al., how slow gamma can be disambiguated from theta harmonics. Specifically, we anticipate that the authors will be able to quantify A) theta harmonics (the number, and their respective frequencies and amplitudes), B) the frequency and amplitude of slow gamma, and C) how they can be quantitatively decoupled. Through this, their discussion of oscillatory changes with novelty-familiarity will garner a significant impact.

      We think we have demonstrated that the slow gamma observed in this study was not purely theta harmonics.  We didn’t focus on the frequency change of slow gamma or theta rhythms in this study.  Further investigation will be carried out on this topic in the future.

      c. Broadly, it is interesting that the authors emphasize the gamma frequency throughout the discussion. Given that the power spectral density of the Local Field Potential (LFP) exhibits a log-log relationship between amplitude and frequency, as described by Buzsáki (2005) in "Rhythms of the Brain," and considering that the LFP is primarily generated through synaptic transmembrane currents (Buzsáki et al., 2012), it seems parsimonious to consider that the bulk of synaptic activity occurs at lower frequencies (e.g., theta). Since synaptic transmission represents the most direct form of inter-regional communication, one might wonder why gamma (characterized by lower amplitude rhythms) is esteemed so highly compared to the higher amplitude theta rhythm. Why isn't the theta rhythm, instead, regarded as the primary mode of communication across brain regions? A discussion exploring this question would be beneficial.

      We thank the reviewer for this deep thinking.  When stating the conclusion on gamma rhythms, we didn’t mean to weaken the role of theta rhythm.  Conversely, the fast or slow gamma episodes were detected riding on theta rhythms, and we believe that the information compression should occur at a finer scale within a theta cycle scale.  More investigation will be carried out on this topic in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is helpful to clearly define "FG-cell sequences" before the relevant results are described in the Results section. More importantly, the seemingly conflicting results between Figure 3 and Figure 8 may need to be clarified.

      The “exFG-sequences and exNFG sequences”, “FG-cell sequences and NFG-cell sequences” have been defined clearly in the revised manuscript.  Moreover, the seemingly conflicting results between Figure 3 and Figure 8 have been interpreted properly.

      (2) It is helpful to clearly state the N and what defines a sample whenever a result is described.

      In each statistical results, the N and what defines a sample have been clarified in the revised manuscript.

      (3) Addressing the questions regarding the methods (#5) would clarify some of the results.

      The questions regarding the Methods part has addressed in the revised manuscript.

      (4) Line #244: "successful" should be "successive"?

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      - The writing of the manuscript can be substantially improved.

      The manuscript can be substantially revised and updated.

      - I noticed that the last author of the manuscript is not the lead or corresponding and has only provided a limited contribution to this work (according to the detailed author contributions). The second to last author seems to be the main senior intellectual contributor and supervisor, together with the third to last author. This speaks of potential bad academic practices where a senior person whose intellectual contribution to the study is relatively minor takes the last author position, against the standard conventions on authorship worldwide. I strongly suggest that this is corrected.

      We thank the reviewer for raising this problem.  The last author Dr. Ming was also a senior author and supervised this project with large contribution.  We have fixed his role as a co-corresponding author in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Qin et al. set out to investigate the role of mechanosensory feedback during swallowing and identify neural circuits that generate ingestion rhythms. They use Drosophila melanogaster swallowing as a model system, focusing their study on the neural mechanisms that control cibarium filling and emptying in vivo. They find that pump frequency is decreased in mutants of three mechanotransduction genes (nompC, piezo, and Tmc), and conclude that mechanosensation mainly contributes to the emptying phase of swallowing. Furthermore, they find that double mutants of nompC and Tmc have more pronounced cibarium pumping defects than either single mutants or Tmc/piezo double mutants. They discover that the expression patterns of nompC and Tmc overlap in two classes of neurons, md-C and md-L neurons. The dendrites of md-C neurons warp the cibarium and project their axons to the subesophageal zone of the brain. Silencing neurons that express both nompC and Tmc leads to severe ingestion defects, with decreased cibarium emptying. Optogenetic activation of the same population of neurons inhibited filling of the cibarium and accelerated cibarium emptying. In the brain, the axons of nompC∩Tmc cell types respond during ingestion of sugar but do not respond when the entire fly head is passively exposed to sucrose. Finally, the authors show that nompC∩Tmc cell types arborize close to the dendrites of motor neurons that are required for swallowing, and that swallowing motor neurons respond to the activation of the entire Tmc-GAL4 pattern.

      Strengths:

      • The authors rigorously quantify ingestion behavior to convincingly demonstrate the importance of mechanosensory genes in the control of swallowing rhythms and cibarium filling and emptying

      • The authors demonstrate that a small population of neurons that express both nompC and Tmc oppositely regulate cibarium emptying and filling when inhibited or activated, respectively

      • They provide evidence that the action of multiple mechanotransduction genes may converge in common cell types

      Thank you for your insightful and detailed assessment of our work. Your constructive feedback will help to improve our manuscript.

      Weaknesses:

      • A major weakness of the paper is that the authors use reagents that are expressed in both md-C and md-L but describe the results as though only md-C is manipulated-Severing the labellum will not prevent optogenetic activation of md-L from triggering neural responses downstream of md-L. Optogenetic activation is strong enough to trigger action potentials in the remaining axons. Therefore, Qin et al. do not present convincing evidence that the defects they see in pumping can be specifically attributed to md-C.

      Thank you for your comments. This is important point that we did not adequately address in the original preprint. We have obtained imaging and behavioral results that strongly suggest md-C, rather than md-L, are essential for swallowing behavior.

      36 hours after the ablation of the labellum, the signals of md-L were hardly observable when GFP expression was driven by the intersection between Tmc-GAL4 & nompC-QF (see F Figure 3—figure supplement 1A). This observation indicates that the axons of md-L likely degenerated after 36 hours, and were unlikely to influence swallowing. Moreover, the projecting pattern of Tmc-GAL4 & nompC-QF>>GFP exhibited no significant changes in the brain post labellum ablation.

      Furthermore, even after labellum ablation for 36 hours, flies exhibited responses to light stimulation (see Figure 3—figure supplement 1B-C, Video 5) when ReaChR was expressed in md-C. We thus reasoned that md-C but not md-L, plays a crucial role in the swallowing process.

      • GRASP is known to be non-specific and prone to false positives when neurons are in close proximity but not synaptically connected. A positive GRASP signal supports but does not confirm direct synaptic connectivity between md-C/md-L axons and MN11/MN12.

      In this study, we employed the nSyb-GRASP, wherein the GRASP is expressed at the presynaptic terminals by fusion with the synaptic marker nSyb. This method demonstrates an enhanced specificity compared to the original GRASP approach.

      Additionally, we utilized +/ UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; + / MN-LexA fruit flies as a negative control to mitigate potential false signals originating from the tool itself (Author response image 1, scale bar = 50μm). Beside the genotype Tmc-Gal4, Tub(FRT. Gal80) / UAS-nSyb-spGFP1-10, lexAop-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies discussed in this manuscript, we also incorporated genotype Tmc-Gal4, Tub(FRT. Gal80) / lexAop-nSyb-spGFP1-10, UAS-CD4-spGFP11 ; nompC-QF, QUAS-FLP / MN-LexA fruit flies as a reverse control (Author response image 2). Unexpectedly, similar positive signals were observed, indicating that, positive signals may emerge due to close proximity between neurons even with nSyb-GRASP.

      Author response image 1.

      It should be noted that the existence of synaptic projections from motor neurons (MN) to md-C cannot be definitively confirmed at this juncture. At present, we can only posit the potential for synaptic connections between md-C and motor neurons. A more conclusive conclusion may be attainable with the utilization of comprehensive whole-brain connectome data in future studies.

      Author response image 2.

      • As seen in Figure 2—figure supplement 1, the expression pattern of Tmc-GAL4 is broader than md-C alone. Therefore, the functional connectivity the authors observe between Tmc expressing neurons and MN11 and 12 cannot be traced to md-C alone

      It is true that the expression pattern of Tmc-GAL4 is broader than that of md-C alone. Our experiments, including those flies expressing TNT in Tmc+ neurons, demonstrated difficulties in emptying (Figure 2A, 2D). Notably, we encountered challenges in finding fly stocks bearing UAS>FRT-STOP-P2X2. Consequently, we opted to utilize Tmc-GAL4 to drive UAS-P2X2 instead. We believe that the results further support our hypothesis on the role of md-C in the observed behavioral change in emptying.

      Overall, this work convincingly shows that swallowing and swallowing rhythms are dependent on several mechanosensory genes. Qin et al. also characterize a candidate neuron, md-C, that is likely to provide mechanosensory feedback to pumping motor neurons, but the results they present here are not sufficient to assign this function to md-C alone. This work will have a positive impact on the field by demonstrating the importance of mechanosensory feedback to swallowing rhythms and providing a potential entry point for future investigation of the identity and mechanisms of swallowing central pattern generators.

      Reviewer #2 (Public Review):

      In this manuscript, the authors describe the role of cibarial mechanosensory neurons in fly ingestion. They demonstrate that pumping of the cibarium is subtly disrupted in mutants for piezo, TMC, and nomp-C. Evidence is presented that these three genes are co-expressed in a set of cibarial mechanosensory neurons named md-C. Silencing of md-C neurons results in disrupted cibarial emptying, while activation promotes faster pumping and/or difficulty filling. GRASP and chemogenetic activation of the md-C neurons is used to argue that they may be directly connected to motor neurons that control cibarial emptying.

      The manuscript makes several convincing and useful contributions. First, identifying the md-C neurons and demonstrating their essential role for cibarium emptying provides reagents for further studying this circuit and also demonstrates the important of mechanosensation in driving pumping rhythms in the pharynx. Second, the suggestion that these mechanosensory neurons are directly connected to motor neurons controlling pumping stands in contrast to other sensory circuits identified in fly feeding and is an interesting idea that can be more rigorously tested in the future.

      At the same time, there are several shortcomings that limit the scope of the paper and the confidence in some claims. These include:

      a) the MN-LexA lines used for GRASP experiments are not characterized in any other way to demonstrate specificity. These were generated for this study using Phack methods, and their expression should be shown to be specific for MN11 and MN12 in order to interpret the GRASP experiments.

      Thanks for the suggestion. We have checked the expression pattern of MN-LexA, which is similar to MN-GAL4 used in previous work (Manzo et al., PNAS., 2012, PMID:22474379) . Here is the expression pattern:

      Author response image 3.

      b) There is also insufficient detail for the P2X2 experiment to evaluate its results. Is this an in vivo or ex vivo prep? Is ATP added to the brain, or ingested? If it is ingested, how is ATP coming into contact with md-C neuron if it is not a chemosensory neuron and therefore not exposed to the contents of the cibarium?

      The P2X2 experimental preparation was done ex vivo. We immersed the fly in the imaging buffer, as described in the Methods section under Functional Imaging. Following dissection and identification of the subesophageal zone (SEZ) area under fluorescent microscopy, we introduced ATP slowly into the buffer, positioned at a distance from the brain

      c) In Figure 3C, the authors claim that ablating the labellum will remove the optogenetic stimulation of the md-L neuron (mechanosensory neuron of the labellum), but this manipulation would presumably leave an intact md-L axon that would still be capable of being optogenetically activated by Chrimson.

      Please refer to the corresponding answers for reviewer 1 and Figure 3—figure supplement 1.

      d) Average GCaMP traces are not shown for md-C during ingestion, and therefore it is impossible to gauge the dynamics of md-C neuron activation during swallowing. Seeing activation with a similar frequency to pumping would support the suggested role for these neurons, although GCaMP6s may be too slow for these purposes.

      Profiling the dynamics of md-C neuron activation during swallowing is crucial for unraveling the operational model of md-C and validating our proposed hypothesis. Unfortunately, our assay faces challenges in detecting probable 6Hz fluorescent changes with GCaMP6s.

      In general, we observed an increase of fluorescent signals during swallowing, but movement of alive flies during swallowing influenced the imaging recording, so we could not depict a decent tracing for calcium imaging for md-C neurons. To enhance the robustness of our findings, patching the md-C neurons would be a more convincing approach. As illustrated in Figure 2, the somata of md-C neurons are situated in the cibarium rather than the brain. patching of the md-C neuron somata in flies during ingestion is difficult.

      e) The negative result in Figure 4K that is meant to rule out taste stimulation of md-C is not useful without a positive control for pharyngeal taste neuron activation in this same preparation.

      We followed methods used in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which we believe could confirm that md-C do not respond to sugars.

      In addition to the experimental limitations described above, the manuscript could be organized in a way that is easier to read (for example, not jumping back and forth in figure order).

      Thanks for your suggestion and the manuscript has been reorganized.

      Reviewer #3 (Public Review):

      Swallowing is an essential daily activity for survival, and pharyngo-laryngeal sensory function is critical for safe swallowing. In Drosophila, it has been reported that the mechanical property of food (e.g. Viscosity) can modulate swallowing. However, how mechanical expansion of the pharynx or fluid content sense and control swallowing was elusive. Qin et al. showed that a group of pharyngeal mechanosensory neurons, as well as mechanosensory channels (nompC, Tmc, and Piezo), respond to these mechanical forces for regulation of swallowing in Drosophila melanogaster.

      Strengths:

      There are many reports on the effect of chemical properties of foods on feeding in fruit flies, but only limited studies reported how physical properties of food affect feeding especially pharyngeal mechanosensory neurons. First, they found that mechanosensory mutants, including nompC, Tmc, and Piezo, showed impaired swallowing, mainly the emptying process. Next, they identified cibarium multidendritic mechanosensory neurons (md-C) are responsible for controlling swallowing by regulating motor neuron (MN) 12 and 11, which control filling and emptying, respectively.

      Weaknesses:

      While the involvement of md-C and mechanosensory channels in controlling swallowing is convincing, it is not yet clear which stimuli activate md-C. Can it be an expansion of cibarium or food viscosity, or both? In addition, if rhythmic and coordinated contraction of muscles 11 and 12 is essential for swallowing, how can simultaneous activation of MN 11 and 12 by md-C achieve this? Finally, previous reports showed that food viscosity mainly affects the filling rather than the emptying process, which seems different from their finding.

      We have confirmed that swallowing sucrose water solution activated md-C neurons, while sucrose water solution alone could not (Figure 4J-K). We hypothesized that the viscosity of the food might influence this expansion process.

      While we were unable to delineate the activation dynamics of md-C neurons, our proposal posits that these neurons could be activated in a single pump cycle, sequentially stimulating MN12 and MN11. Another possibility is that the activation of md-C neurons acts as a switch, altering the oscillation pattern of the swallowing central pattern generator (CPG) from a resting state to a working state.

      In the experiments with w1118 flies fed with MC (methylcellulose) water, we observed that viscosity predominantly affects the filling process rather than the emptying process, consistent with previous findings. This raises an intriguing question. Our investigation into the mutation of mechanosensitive ion channels revealed a significant impact on the emptying process. We believe this is due to the loss of mechanosensation affecting the vibration of swallowing circuits, thereby influencing both the emptying and filling processes. In contrast, viscosity appears to make it more challenging for the fly to fill the cibarium with food, primarily attributable to the inherent properties of the food itself.

      Reviewer #4 (Public Review):

      A combination of optogenetic behavioral experiments and functional imaging are employed to identify the role of mechanosensory neurons in food swallowing in adult Drosophila. While some of the findings are intriguing and the overall goal of mapping a sensory to motor circuit for this rhythmic movement are admirable, the data presented could be improved.

      The circuit proposed (and supported by GRASP contact data) shows these multi-dendritic neurons connecting to pharyngeal motor neurons. This is pretty direct - there is no evidence that they affect the hypothetical central pattern generator - just the execution of its rhythm. The optogenetic activation and inhibition experiments are constitutive, not patterned light, and they seem to disrupt the timing of pumping, not impose a new one. A slight slowing of the rhythm is not consistent with the proposed function.

      Motor neurons implicated in patterned motions can be considered effectors of Central Pattern Generators (CPGs)(Marder et al., Curr Biol., 2001, PMID: 11728329; Hurkey et al., Nature., 2023, PMID:37225999). Given our observation of the connection between md-C neurons and motor neurons, it is reasonable to speculate that md-C neurons influence CPGs. Compared to the patterned light (0.1s light on and 0.1s light off) used in our optogenetic experiments, we noted no significant changes in their responses to continuous light stimulation. We think that optogenetic methods may lead to overstimulation of md-C neurons, failing to accurately mimic the expansion of the cibarium during feeding.

      Dysfunction in mechanosensitive ion channels or mechanosensory neurons not only disrupts the timing of pumping but also results in decreased intake efficiency (Figure 1E). The water-swallowing rhythm is generally stable in flies, and swallowing is a vital process that may involve redundant ion channels to ensure its stability.

      The mechanosensory channel mutants nompC, piezo, and TMC have a range of defects. The role of these channels in swallowing may not be sufficiently specific to support the interpretation presented. Their other defects are not described here and their overall locomotor function is not measured. If the flies have trouble consuming sufficient food throughout their development, how healthy are they at the time of assay? The level of starvation or water deprivation can affect different properties of feeding - meal size and frequency. There is no description of how starvation state was standardized or measured in these experiments.

      Defects in mechanosensory channel mutants nompC, piezo, and TMC, have been extensively investigated (Hehlert et al., Trends Neurosci., 2021, PMID:332570000). Mutations in these channels exhibit multifaceted effects, as illustrated in our RNAi experiments (see Figure 2E). Deprivation of water and food was performed in empty fly vials. It's important to note that the duration of starvation determines the fly's willingness to feed but not the pump frequency (Manzo et al., PNAS., 2012, PMID:22474379).

      In most cases, female flies were deprived water and food in empty vials for 24 hours because after that most flies would be willing to drink water. The deprivation time is 12 hours for flies with nompC and Tmc mutated or flies with Kir2.1 expressed in md-C neurons, as some of these flies cannot survive 24h deprivation.

      The brain is likely to move considerably during swallow, so the GCaMP signal change may be a motion artifact. Sometimes this can be calculated by comparing GCaMP signal to that of a co-expressed fluorescent protein, but there is no mention that this is done here. Therefore, the GCaMP data cannot be interpreted.

      We did not co-express a fluorescent protein with GCaMP for md-C. The head of the fly was mounted onto a glass slide, and we did not observe significant signal changes before feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      .>Abstract: I disagree that swallow is the first step of ingestion. The first paragraph also mentions the final checkpoint before food ingestion. Perhaps sufficient to say that swallow is a critical step of ingestion.

      Indeed, it is not rigorous enough to say “first step”. This has been replaced by “early step”.

      Introduction:

      Line 59: "Silence" should be "Silencing"

      This has been replaced.

      Results:

      Lines 91-92: I am not clear about what this means. 20% of nompC and 20% of wild-type flies exhibit incomplete filling? So nompC is not different from wild-type?

      Sorry for the mistake. Viscous foods led to incomplete emptying (not incomplete filling), as displayed in Video 4. The swallowing behavior differs between nompC mutants and wild-type flies, as illustrated in Figure 1C, Figure 1—figure supplement 1A-C and video 1&5.

      When fed with 1% MC water solution (Figure 1—figure supplement 1E-H). We found that when fed with 1% MC watere solution, Tmc or piezo mutants displayed incomplete emptying, which could constitute a long time proportion of swallowing behavior; while only 20% of nompC flies and 20% of wild-type flies sporadically exhibit incomplete emptying, which is significantly different. Though the percent of flies displaying incomplete pump is similar between nompC mutant and wild-type files, you can find it quite different in video 1 and 5.

      Line 94: Should read: “while for foods with certain viscosity, the pump of Tmc or piezo mutants might"

      What evidence is there for weakened muscle motion? The phenotypes of all three mutants is quite similar, so concluding that they have roles in initiation versus swallowing strength is not well supported -this would be better moved to the discussion since it is speculative.

      Muscles are responsible for pumping the bolus from the mouth to the crop. In the case of Tmc or piezo mutants, as evidenced by incomplete filling for viscous foods (see Video 4), we speculate that the loss of sensory stimuli leads to inadequate muscle contraction. The phenotypes observed in Tmc and piezo mutants are similar yet distinct from those of the wild-type or nompC mutant, as shown in Video 1 and 4. The phrase "due to weakened muscle motion" has been removed for clarity.

      Line 146: If md-L neurons are also labeled by this intersection, then you are not able to know whether the axons seen in the brain are from md-L or md-C neurons. Line 148: cutting the labellum is not sufficient to ablate md-L neurons. The projections will still enter the brain and can be activated with optogenetics, even after severing the processes that reside in the labellum.

      Please refer to the responses for reviewer #1 (Public Review):” A major weakness of the paper…” and Figure 4.

      Line 162: If the fly head alone is in saline, do you know that the sucrose enters the esophagus? The more relevant question here is whether the md-C neurons respond to mechanical force. If you could artificially inflate the cibarium with air and see the md-C neurons respond that would be a more convincing result. So far you only know that these are activated during ingestion, but have not shown that they are activated specifically by filling or emptying. In addition, you are not only imaging md-C (md-L is also labeled). This caveat should be mentioned.

      We followed the methods outlined in the previous work (Chen et al., Cell Rep., 2019, PMID:31644916), which suggested that md-C neurons do not respond to sugars. While we aimed to mechanically stimulate md-C neurons, detecting signal changes during different steps of swallowing is challenging. This aspect could be further investigated in subsequent research with the application of adequate patch recording or two-photon microscopy (TPM).

      Figure 3: It is not clear what the pie charts in Figure 3 A refer to. What are the three different rows, and what does blue versus red indicate?

      Figure 3A illustrates three distinct states driven by CsChrimson light stimulation of md-C neurons, with the proportions of flies exhibiting each state. During light activation, flies may display difficulty in filling, incomplete filling, or a normal range of pumping. The blue and red bars represent the proportions of flies showing the corresponding state, as indicated by the black line.

      Figure 4: Where are the example traces for J? The comparison in K should be average dF/F before ingestion compared with average dF/F during ingestion. Comparing the in vitro response to sucrose to the in vivo response during ingestion is not a useful comparison.

      Please refer to the answers for reviewer #2 question d).

      Reviewer #2 (Recommendations For The Authors):

      Suggested experiments that would address some of my concerns listed in the public review include:

      a) high resolution SEZ images of MN-LexA lines crossed to LexAop-GFP to demonstrate their specificity

      b) more detail on the P2X2 experiment. It is hard to make suggestions beyond that without first seeing the details.

      c) presenting average GCaMP traces for all calcium imaging results

      d) to rule out taste stimulation of md-C (Figure 4K) I would suggest performing more extensive calcium imaging experiments with different stimuli. For example, sugar, water, and increasing concentrations of a neutral osmolyte (e.g. PEG) to suppress the water response. I think that this is more feasible than trying to get an in vitro taste prep to be convincing.

      Please refer to the responses for public review of reviewer #2.

      Reviewer #3 (Recommendations For The Authors):

      Below I list my suggestions as well as criticisms.

      (1) It would be excellent if the authors could demonstrate whether varying levels of food viscosity affect md-C activation.

      That is a good point, and could be studied in future work.

      (2) It is not clear whether an intersectional approach using TMC-GAL4 and nompC-QF abolishes labelling of the labellar multidendritic neurons. If this is the case, please show labellar multidendritic neurons in TMC-GAL4 only flies and flies using the intersectional approach. Along with this question, I am concerned that labellum-removed flies could be used for feeding assay.

      Intersectional labelling using TMC-GAL4 and nompC-QF could not abolish labelling of the labellar multidendritic neurons (Author response image 4). Labellum-removed flies could be used for feeding assay (Figure 3—figure supplement 1B-C, video 5), but once LSO or cibarium of fly was damaged, swallowing behavior would be affected. Removing labellum should be very careful.

      Author response image 4.

      (3) Please provide the detailed methods for GRASP and include proper control.

      Please refer to the responses for public review of reviewer #1.

      (4) The authors hypothesized that md-C sequentially activates MN11 and 12. Is the time gap between applying ATP on md-C and activation of MN11 or MN12 different? Please refer to the responses for public review of reviewer #3. The time gap between applying ATP on md-C and activation of MN11 or MN12 didn’t show significant differences, and we think the reason is that the ex vivo conditions could not completely mimic in vivo process.

      I found the manuscript includes many errors, which need to be corrected.

      (1) The reference formatting needs to be rechecked, for example, lines 37, 42, and 43.

      (2) Line 44-46: There is some misunderstanding. The role of pharyngeal mechanosensory neurons is not known compared with chemosensory neurons.

      (3) Line 49: Please specify which type of quality of food. Chemical or physical?

      (4) Line 80 and Figure 1B-D Authors need to put filling and emptying time data in the main figure rather than in the supplementary figure. Otherwise, please cite the relevant figures in the text(S1A-C).

      (5) Line 84-85; Is "the mutant animals" indicating only nompC? Please specify it.

      (6) Figure 1a: It is hard to determine the difference between the series of images. And also label filling and emptying under the time.

      (7) S1E-H: It is unclear what "Time proportion of incomplete pump" means. Please define it.

      (8) Please reorganize the figures to follow the order of the text, for example, figures 2 and 4

      (9) Figure 4A. There is mislabelling in Figure 4A. It is supposed to be phalloidin not nc82.

      (10) Figure 4K: It does not match the figure legend and main text.

      (11) Figure 4D and G: Please indicate ATP application time point.

      Thanks for your correction and all the points mentioned were revised.

      Reviewer #4 (Recommendations For The Authors):

      The figures need improvement. 1A has tiny circles showing pharynx and any differences are unclear.

      The expression pattern of some of these drivers (Supplement) seems quite broad. The tmc nompC intersection image in Figure 1F is nice but the cibarium images are hard to interpret: does this one show muscle expression? What are "brain" motor neurons? Where are the labellar multi-dendritic neurons?

      Tmc nompC intersection image show no expression in muscles. Somata of motor neurons 12 or 11 situated at SEZ area of brain, while somata of md-C neurons are in the cibarium. Image of md-L neurons was posted in response for reviewer #3 (Recommendations For The Authors):

      Why do the assays alternate between swallowing food and swallowing water?

      Thank for your suggestion, figure 1A has been zoomed-in. The Tmc nompC intersection image in Figure 2F displayed the position of md-C neurons in a ventral perspective, and muscles were not labelled. We stained muscles in cibarium by phalloidin and the image is illustrated in Figure 4A, while we didn’t find overlap between md-C neurons and muscles. Image of md-L neurons were posted as Author response image 4.

      In the majority of our experiments, we employed water to test swallowing behavior, while we used methylcellulose water solution to test swallowing behavior of mechanoreceptor mutants, and sucrose solution for flies with md-C neurons expressing GCaMP since they hardly drank water when their head capsules were open.

      How starved or water-deprived were the flies?

      One day prior to the behavioral assays, flies were transferred to empty vials (without water or food) for 24 hours for water deprivation. Flies who could not survive 24h deprivation would be deprived for 12h.

      How exactly was the pumping frequency (shown in Fig 1B) measured? There is no description in the methods at all. If the pump frequency is scored by changes in blue food intensity (arbitrary units?), this seems very subjective and maybe image angle dependent. What was camera frame rate? Can it capture this pumping speed adequately? Given the wealth of more quantitative methods for measuring food intake (eg. CAFE, flyPAD), it seems that better data could be obtained.

      How was the total volume of the cibarium measured? What do the pie charts in Figure 3A represent?

      The pump frequency was computed as the number of pumps divided by the time scale, following the methodology outlined in Manzo et al., 2012. Swallowing curves were plotted using the inverse of the blue food intensity in the cibarium. In this representation, ascending lines signify filling, while descending lines indicate emptying (see Figure 2D, 3B). We maintain objectivity in our approach since, during the recording of swallowing behavior, the fly was fixed, and we exclusively used data for analysis when the Region of Interest (ROI) was in the cibarium. This ensures that the intensity values accurately reflect the filling and emptying processes. Furthermore, we conducted manual frame-by-frame checks of pump frequency, and the results align with those generated by the time series analyzer V3 of ImageJ.

      For the assessment of total volume of ingestion, we referred the methods of CAFE, utilizing a measurable glass capillary. We then calculated the ingestion rate (nL/s) by dividing the total volume of ingestion by the feeding time.

      The changes seem small, in spite of the claim of statistical significance.

      The observed stability in pump frequency within a given genotype underscores the significance of even seemingly small changes, which is statistically significant. We speculate that the stability in swallowing frequency suggests the existence of a redundant mechanism to ensure the robustness of the process. Disruption of one channel might potentially be partially compensated for by others, highlighting the vital nature of the swallowing mechanism.

      How is this change in pump frequency consistent with defects in one aspect of the cycle - either ingestion (activation) or expulsion (inhibition)?

      Please refer to Figure 2, 3. Both filling and emptying process were affects, while inhibition mainly influences emptying time (Figure 1—figure supplement 1).

      for the authors:

      Line 48: extensively

      Line 62 - undiscovered.

      Line 107, 463: multi

      Line 124: What is "dysphagia?" This is an unusual word and should be defined.

      Line 446: severe

      Line 466: in the cibarium or not?

      Thanks for your correction and all the places mentioned were revised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for organizing the reviews for our manuscript: Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties,” and for the positive eLife assessment. We also thank the reviewers for their constructive comments. We have addressed every comment, which has helped to improve the transparency and readability of the manuscript. The main changes to the manuscript are summarized as follows:

      1. Surrogate distributions were created for each participant and session to estimate the effect of tACS-phase lag on behavioral entrainment to the sound that could have occurred by chance or because of our analysis method (R1). The actual tACS-amplitude effects were normalized relative to the surrogate distribution, and statistical analysis was performed on the normalized (z-score) values. This analysis did not change our main outcome: that tACS modulates behavioral entrainment to the sound depending on the phase lag between the auditory and the electrical signals. This analysis has now been incorporated into the Results section and in Fig. 3c-d.

      2. Two additional supplemental figures were created to include the single-participant data related to Fig. 3b and 3e (R2).

      3. Additional editing of the manuscript has been performed to improve the readability.

      Below, you will find a point-by-point response to the reviewers’ comments.

      Reviewer #1 (Public Review):

      We are grateful for the reviewer’s positive assessment of the potential impact of our study. The reviewer’s primary concerns were 1) the tACS lag effects reported in the manuscript might be noise because of the realignment procedure, and 2) no multiple comparisons correction was conducted in the model comparison procedure.

      In response to point 1), we have reanalyzed the data in exactly the manner prescribed by the reviewer. Our effects remain, and the new control analysis strengthens the manuscript. 2) In the context of model comparison, the model selection procedure was not based on evaluating the statistical significance of any model or predictor. Instead, the single model that best fit the data was selected as the model with the lowest Akaike’s information criterion (AIC), and its superiority relative to the second-best model was corroborated using the likelihood ratio test. Only the best model was evaluated for significance and analyzed in terms of its predictors and interactions. This model is an omnibus test and does not require multiple comparison correction unless there are posthoc decompositions. For similar approaches, see (Kasten et al., 2019).

      Below, we have responded to each comment specifically or referred to this general comment.

      Summary of what the authors were trying to achieve.

      This paper studies the possible effects of tACS on the detection of silence gaps in an FM-modulated noise stimulus. Both FM modulation of the sound and the tACS are at 2Hz, and the phase of the two is varied to determine possible interactions between the auditory and electric stimulation. Additionally, two different electrode montages are used to determine if variation in electric field distribution across the brain may be related to the effects of tACS on behavioral performance in individual subjects.

      Major strengths and weaknesses of the methods and results.

      The study appears to be well-powered to detect modulation of behavioral performance with N=42 subjects. There is a clear and reproducible modulation of behavioral effects with the phase of the FM sound modulation. The study was also well designed, combining fMRI, current flow modeling, montage optimization targeting, and behavioral analysis. A particular merit of this study is to have repeated the sessions for most subjects in order to test repeat-reliability, which is so often missing in human experiments. The results and methods are generally well-described and well-conceived. The portion of the analysis related to behavior alone is excellent. The analysis of the tACS results is also generally well described, candidly highlighting how variable results are across subjects and sessions. The figures are all of high quality and clear. One weakness of the experimental design is that no effort was made to control for sensation effects. tACS at 2Hz causes prominent skin sensations which could have interacted with auditory perception and thus, detection performance.

      The reviewer is right that we did not control for the sensation effects in our paradigm. We asked the participants to rate the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. Nevertheless, we did not consider controlling for skin sensations necessary given the within-participant nature of our design (all participants experienced all six tACS–audio phase lag conditions, which were identical in their potential to cause physical sensations; the only difference between conditions was related to the timing of the auditory stimulus). That is, while the reviewer is right that 2-Hz tACS can indeed induce skin sensation under the electrodes, in this study, we report the effects that depend on the tACS-phase lag relative to the FM-stimulus. Note that the starting phase of the FM-stimulus was randomized across trials within each block (all six tACS audio lags were presented in each block of stimulation). We have no reason to expect the skin sensation to change with the tACS-audio lag from trial to trial, and therefore do not consider this to be a confound in our design. We have added some sentences with this information to the Discussion section:

      Pages 16-17, lines 497-504: “Note that we did not control for the skin sensation induced by 2-Hz tACS in this experiment. Participants rated the strength of the perceived stimulation after each run. However, this information was used only to assess the safety and tolerability of the stimulation protocol. It is in principle possible that skin sensation would depend on tACS phase itself. However, in this study, we report effects that depend on the relationship between tACS-phase and FM-stimulus phase, which changed from trial to trial as the starting phase of the FM-stimulus was randomized across trials. We have no reason to expect the skin sensation to change with the tACS-audio lag and therefore do not consider this to be a confound in our data.”

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      Unfortunately, the main effects described for tACS are encumbered by a lack of clarity in the analysis. It does appear that the tACS effects reported here could be an artifact of the analysis approach. Without further clarification, the main findings on the tACS effects may not be supported by the data.

      Likely impact of the work on the field, and the utility of the methods and data to the community.

      The central claim is that tACS modulates behavioral detection performance across the 0.5s cycle of stimulation. However, neither the phase nor the strength of this effect reproduces across subjects or sessions. Some of these individual variations may be explainable by individual current distribution. If these results hold, they could be of interest to investigators in the tACS field.

      The additional context you think would help readers interpret or understand the significance of the work.

      The following are more detailed comments on specific sections of the paper, including details on the concerns with the statistical analysis of the tACS effects.

      The introduction is well-balanced, discussing the promise and limitations of previous results with tACS. The objectives are well-defined.

      The analysis surrounding behavioral performance and its dependence on the phase of the FM modulation (Figure 3) is masterfully executed and explained. It appears that it reproduces previous studies and points to a very robust behavioral task that may be of use in other studies.

      Again, we would like to thank the reviewer for the positive assessment of the potential impact of our work and for the thoughtful comments regarding the methodology. For readability in our responses, we have numbered the comments below.

      1. There is a definition of tACS(+) vs tACS(-) based on the relative phase of tACS that may be problematic for the subsequent analysis of Figures 4 and 5. It seems that phase 0 is adjusted to each subject/session. For argument's sake, let's assume the curves in Fig. 3E are random fluctuations. Then aligning them to best-fitting cosine will trivially generate a FM-amplitude fluctuation with cosine shape as shown in Fig. 4a. Selecting the positive and negative phase of that will trivially be larger and smaller than a sham, respectively, as shown in Fig 4b. If this is correct, and the authors would like to keep this way of showing results, then one would need to demonstrate that this difference is larger than expected by chance. Perhaps one could randomize the 6 phase bins in each subject/session and execute the same process (fit a cosine to curves 3e, realign as in 4a, and summarize as in 4b). That will give a distribution under the Null, which may be used to determine if the contrast currently shown in 4b is indeed statistically significant.

      We agree with the reviewer’s concerns regarding the possible bias induced by the realignment procedure used to estimate tACS effects. Certainly, when adjusting phase 0 to each participant/session’s best tACS phase (peak in the fitting cosine), selecting the positive phase of the realigned data will be trivially larger than sham (Fig. 4a). This is why the realigned zero-phase and opposite phase (trough) bins were excluded from the analysis in Fig. 4b. Therefore, tACS(+) vs. tACS(-) do not represent behavioral entrainment at the peak positive and negative tACS lags, as both bins were already removed from the analysis. tACS(+) and tACS(-) are the averages of two adjacent bins from the positive and negative tACS lags, respectively (Zoefel et al., 2019). Such an analysis relies on the idea that if the effect of tACS is sinusoidal, presenting the auditory stimulus at the positive half cycle should be different than when the auditory stimulus lags the electrical signal by the other half. If the effect of tACS was just random noise fluctuations, there is no reason to assume that such fluctuations would be sinusoidal; therefore, any bias in estimating the effect of tACS should be removed when excluding the peak to which the individual data were realigned. Similar analytical procedures have been used previously in the literature (Riecke et al., 2015; Riecke et al., 2018). We have modified the colors in Fig. 4a and 4c (former 4b) and added a new panel to the figure (new 4b) to make the realignment procedure, including the exclusion of the realigned peak and trough data, more visually obvious.

      Moreover, we very much like the reviewer’s suggestion to normalize the magnitude of the tACS effect using a permutation strategy. We performed additional analyses to normalize our tACS effect in Fig. 4c by the probability of obtaining the effect by chance. For each subject and session, tACS-phase lags were randomized across trials for a total of 1000 iterations. For each iteration, the gaps were binned by the FM-stimulus phase and tACS-lag. For each tACS-lag, the amplitude of behavioral entrainment to the FM-stimulus was estimated (FM-amplitude), as shown in Fig. 3. Similar to the original data, a second cosine fit was estimated for the FM-amplitude by tACS-lag. Optimal tACS-phase was estimated from the cosine fit and FM-amplitude values were realigned. Again, the realigned phase 0 and trough were removed from the analysis, and their adjacent bins were averaged to obtain the FM-amplitude at tACS(+) and tACS(−), as shown in Fig. 4c. We then computed the difference between 1) tACS(+) and sham, 2) tACS(-) and sham, and 3) tACS(+) and tACS (-), for the original data and the permuted datasets. This procedure was performed for each participant and session to estimate the size of the tACS effect for the original and surrogate data. The original tACS effects were transformed to z-scores using surrogate distributions, providing us with an estimate of the size of the real effect relative to chance. We then computed one-sample t-tests to compare whether the effects of tACS were statistically significant. In fact, this analysis showed that the tACS effects were still statistically significant. This analysis has been added to the Results and Methods sections and is included in Figure 4d.

      Page 10, lines 282-297: “In order to further investigate whether the observed tACS effect was significantly larger than chance and not an artifact of our analysis procedure (33), we created 1000 surrogate datasets per participant and session by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data. This yielded a surrogate distribution of tACS(+) and tACS(-) values for each participant and session. These values were averaged across sessions since the original analysis did not show a main effect of session. We then computed the difference between tACS(+) and sham, tACS(-) and sham, and tACS(+) and tACS(-), separately for the original and surrogate datasets. The obtained difference for the original data where then z-scored using the mean and standard deviation of the surrogate distribution. Note that in this case we used data of all 42 participants who had at least one valid session (37 participants with both sessions). Three one-sample t-tests were conducted to investigate whether the size of the tACS effect obtained in the original data was significantly larger than that obtained by chance (Fig. 4d). This analysis showed that all z-scores were significantly higher than zero (all t(41) > 2.36, p < 0.05, all p-values corrected for multiple comparisons using the Holm-Bonferroni method).”

      Page 31, lines 962-972: “To further control that the observed tACS effects were not an artifact of the analysis procedure, the difference between the tACS conditions (sham, tACS(+), and tACS(-)) were normalized using a permutation approach. For each participant and session, 1000 surrogate datasets were created by permuting the tACS lag designation across trials. The same binning procedure, realignment, and cosine fits were applied to each surrogate dataset as for the original data (see above). FM-amplitude at sham, tACS(+) and tACS(-) were averaged across sessions since the original analysis did not show a main effect of session. Difference between tACS conditions were estimated for the original and surrogate datasets and the resulting values from the original data were z-scored using the mean and standard deviation from the surrogate distributions. One-sample t-tests were conducted to test the statistical significance of the z-scores. P-values were corrected for multiple comparisons using the Holm-Bonferroni method.”

      1. Results of Fig 5a and 5b seem consistent with the concern raised above about the results of Fig. 4. It appears we are looking at an artifact of the realignment procedure, on otherwise random noise. In fact, the drop in "tACS-amplitude" in Fig. 5c is entirely consistent with a random noise effect.

      Please see our response to the comment above.

      1. To better understand what factors might be influencing inter-session variability in tACS effects, we estimated multiple linear models ..." this post hoc analysis does not seem to have been corrected for multiple comparisons of these "multiple linear models". It is not clear how many different things were tried. The fact that one of them has a p-value of 0.007 for some factors with amplitude-difference, but these factors did not play a role in the amplitude-phase, suggests again that we are not looking at a lawful behavior in these data.

      We suspect that the reviewer did not have access to the supplemental materials where all tables (relevant here is Table S3) are provided. This post hoc analysis was performed as an exploratory analysis to better understand the factors that could influence the inter-session variability of tACS effects. In Table S3, we provide the formula for each of the seven models tested, including their Akaike information criteria corrected for small samples (AICc), R2, F, and p-values. As described in the methods section, the winning model was selected as the model with the smallest AICc. A similar procedure has been previously used in the literature (Kasten et al., 2019). Moreover, to ensure that our winning model was better at explaining the data than the second-best unrestricted model, we used the likelihood ratio test. After choosing the winning model and before reporting the significance of the predictors, we examined the significance of the model in and of itself, taking into account its R2 as well as F- and p-values relative to a constant model. Thus, only one model is being evaluated in terms of statistical significance. Therefore, to our understanding, there are no multiple comparisons to correct for. We added the information regarding the selection procedure, hoping this will make the analysis clearer.

      See page 12, lines 354-360: “This model was selected because it had the smallest Akaike’s information criterion (corrected for small samples), AICc. Moreover, the likelihood ratio test showed no evidence for choosing the more complex unrestricted model (stat = 2.411, p = 0.121). Following the same selection criteria, the winning model predicting inter-session variability in tACS-phase, included only the factor gender (Table S4). However, this model was not significant in and of itself when compared to a constant model (F-statistic vs. constant model: 3.05, p = 0.09, R2 = 0.082).”

      1. "So far, our results demonstrate that FM-stimulus driven behavioral modulation of gap detection (FM-amplitude) was significantly affected by the phase lag between the FM-stimulus and the tACS signal (Audio-tACS lag) ..." There appears to be nothing in the preceding section (Figures 4 and 5) to show that the modulation seen in 3e is not just noise. Maybe something can be said about 3b on an individual subject/session basis that makes these results statistically significant on their own. Maybe these modulations are strong and statistically significant, but just not reproducible across subjects and sessions?

      Please see our response to the first comment regarding the validity of our analysis for proving the significant effect of tACS lag on modulating behavioral entrainment to the FM-stimulus (FM-amplitude), and the new control analysis. After performing the permutation tests, to make sure the reported effects are not noise, our statistical analysis still shows that tACS-lag does significantly modulate behavioral entrainment to the sound (FM-amplitude). Thus, the reviewer is right to say “these modulations are strong and statistically significant, just not reproducible across subjects and sessions”. In this regard, we consider our evaluation of session-to-session reliability of tACS effects is of high relevance for the field, as this is often overlooked in the literature.

      1. "Inter-individual variability in the simulated E-field predicts tACS effects" Authors here are attempting to predict a property of the subjects that was just shown to not be a reliable property of the subject. Authors are picking 9 possible features for this, testing 33 possible models with N=34 data points. With these circumstances, it is not hard to find something that correlates by chance. And some of the models tested had interaction terms, possibly further increasing the number of comparisons. The results reported in this section do not seem to be robust, unless all this was corrected for multiple comparisons, and it was not made clear?

      We thank the reviewer very much for this comment. While the reviewer is right that in these models, we are trying to predict an individual property (tACS-amplitude) that was not test–retest reliable across sessions, we still consider this to be a valid analysis. Here, we take the tACS-amplitude averaged across sessions, trying to predict the probability of a participant to be significantly modulated by tACS, in general, regardless of day-to-day variability. Regarding the number of multiple regression models, how we chose the winning model and the appropriateness/need of multiple-comparisons correction in this case, please see our explanation under “Reviewer 1 (Public review)” and our response to comment 3.

      1. "Can we reduce inter-individual variability in tACS effects ..." This section seems even more speculative and with mixed results.

      We agree with the reviewer that this section is a bit speculative. We are trying to plant some seeds for future research can help move the field forward in the quest for better stimulation protocols. We have added a sentence at the end of the section to explicitly say that more evidence is needed in this regard.

      Page 14, lines 428-429: “At this stage, more evidence is needed to prove the superiority of individually optimized tACS montages for reducing inter-individual variability in tACS effects.”

      Given the concerns with the statistical analysis above, there are concerns about the following statements in the summary of the Discussion:

      1. "2) does modulate the amplitude of the FM-stimulus induced behavioral modulation (FM-amplitude)"

      This seems to be based on Figure 4, which leaves one with significant concerns.

      Please see response to comment 1. We hope the reviewer is satisfied with our additional analysis to make sure the effect of tACS here reported is not noise.

      1. "4) individual variability in tACS effect size was partially explained by two interactions: between the normal component of the E-field and the field focality, and between the normal component of the E-field and the distance between the peak of the electric field and the functional target ROIs."

      The complexity of this statement alone may be a good indication that this could be the result of false discovery due to multiple comparisons.

      We respectfully disagree with the reviewer’s opinion that this is a complex statement. We think that these interaction effects are very intuitive as we explain in the results and discussion sections. These significant interactions show that for tACS to be effective, it matters that current gets to the right place and not to irrelevant brain regions. We believe this finding is of great importance for the field, since most studies on the topic still focus mostly on predicting tACS effects from the absolute field strength and neglect other properties of the electric field.

      For the same reasons as stated above, the following statements in the Abstract do not appear to have adequate support in the data:

      "We observed that tACS modulated the strength of behavioral entrainment to the FM sound in a phase-lag specific manner. ... Inter-individual variability of tACS effects was best explained by the strength of the inward electric field, depending on the field focality and proximity to the target brain region. Spatially optimizing the electrode montage reduced inter-individual variability compared to a standard montage group."

      Please see response to all previous comments

      In particular, the evidence in support of the last sentence is unclear. The only finding that seems related is that "the variance test was significant only for tACS(-) in session 2". This is a very narrow result to be able to make such a general statement in the Abstract. But perhaps this can be made clearer.

      We changed this sentence in the abstract to:

      Page 2, lines 41-43: “Although additional evidence is necessary, our results also provided suggestive insights that spatially optimizing the electrode montage could be a promising tool to reduce inter-individual variability of tACS effects.”

      Reviewer #3 (Public Review):

      In "Behavioral entrainment to rhythmic auditory stimulation can be modulated by tACS depending on the electrical stimulation field properties" Cabral-Calderin and collaborators aimed to document 1) the possible advantages of personalized tACS montage over standard montage on modulating behavior; 2) the inter-individual and inter-session reliability of tACS effects on behavioral entrainment and, 3) the importance of the induced electric field properties on the inter-individual variability of tACS.

      To do so, in two different sessions, they investigated how the detection of silent gaps occurring at random phases of a 2Hz- amplitude modulated sound could be enhanced with 2Hz tACS, delivered at different phase lags. In addition, they evaluated the advantage of using spatially optimized tACS montages (information-based procedure - using anatomy and functional MRI to define the target ROI and simulation to compare to a standard montage applied to all participants) on behavioral entrainment. They first show that the optimized and the standard montages have similar spatial overlap to the target ROI. While the optimized montage induced a more focal field compared to the standard montage, the latter induced the strongest electric field. Second, they show that tACS does not modify the optimal phase for gap detection (phase of the frequency-modulated sound) but modulates the strength of behavioral entrainment to the frequency-modulated sound in a phase-lag specific manner. However, and surprisingly, they report that the optimal tACS lag, and the magnitude of the phasic tACS effect were highly variable across sessions. Finally, they report that the inter-individual variability of tACS effects can be explained by the strength of the inward electric field as a function of the field focality and on how well it reached the target ROI.

      The article is interesting and well-written, and the methods and approaches are state-of-the-art.

      Strengths:

      • The information-based approach used by the authors is very strong, notably with the definition of subject-specific targets using a fMRI localizer and the simulation of electric field strength using 3 different tACS montages (only 2 montages used for the behavioral experiment).

      • The inter-session and inter-individual variability are well documented and discussed. This article will probably guide future studies in the field.

      Weaknesses:

      • The addition of simultaneous EEG recording would have been beneficial to understand the relationship between tACS entrainment and the entrainment to rhythmic auditory stimulation.

      We are grateful for the Reviewer’s positive assessment of our work and for the reviewer’s recommendations. We agree with the reviewer that adding simultaneous EEG or MEG to our design would have been beneficial to understand tACS effects. However, as the reviewer might be familiar with, such combination also possesses additional challenges due to the strong artifacts induced by tACS in the EEG signals, which is at the frequency of interest and several orders of magnitude higher than the signal of interest. Unfortunately, the adequate setup for simultaneous tACS-EEG was not available at the moment of the study. Nevertheless, since we are using a paradigm that we have repeatedly studied in the past and have shown it entrains neural activity and modulates behavior rhythmically, we are confident our results are of interest on their own. For readability of our answers, we numbered to comments below.

      1. It would have been interesting to develop the fact that tACS did not "overwrite" neural entrainment to the auditory stimulus. The authors try to explain this effect by mentioning that "tACS is most effective at modulating oscillatory activity at the intended frequency when its power is not too high" or "tACS imposes its own rhythm on spiking activity when tACS strength is stronger than the endogenous oscillations but it decreases rhythmic spiking when tACS strength is weaker than the endogenous oscillations". However, it is relevant to note that the oscillations in their study are by definition "not endogenous" and one can interpret their results as a clear superiority of sensory entrainment over tACS entrainment. This potential superiority should be discussed, documented, and developed.

      We thank the reviewer very much for this remark. We completely agree that our results could be interpreted as a clear superiority of sensory entrainment over tACS entrainment. We have now incorporated this possibility in the discussion.

      Page 16, line 472-478: “Alternatively, our results could simply be interpreted as a clear superiority of the auditory stimulus for entrainment. In other words, sensory entrainment might just be stronger than tACS entrainment in this case where the stimulus rhythm was strong and salient. It would be interesting to further test whether this superiority of sensory entrainment applies to all sensory modalities or if there is a particular advantage for auditory stimuli when they compete with electrical stimulation. However, answering this question was beyond the scope of our study and needs further investigations with more appropriate paradigms.”

      1. The authors propose that "by applying tACS at the right lag relative to auditory rhythms, we can aid how the brain synchronizes to the sounds and in turn modulate behavior." This should be developed as the authors showed that the tACS lags are highly variable across sessions. According to their results, the optimal lag will vary for each tACS session and subtle changes in the montage could affect the effects.

      We thank the reviewer for this remark. We believe that the right procedure in this case would be using close-loop protocols where the optimal tACS-lag is estimated online as we discuss in the summary and future directions sub-section. We tried to make this clearer in the same sentence that the reviewer mentioned.

      Page 17, line 506-508: “Since optimal tACS phase was variable across participants and sessions, this approach would require closed-loop protocols where the optimal tACS lag is estimated online (see next section).”

      1. In a related vein, it would be very useful to show the data presented in Figure 3 (panels b,d,e) for all participants to allow the reader to evaluate the quality of the data (this can be added as a supplementary figure).

      Thank you very much for the suggestion. We have added two new supplemental figures (Fig S1 and S2) to show individual data for Fig. 3b and 3e. Note that Fig. 3d already shows the individual data as each circle represents optimal FM-phase for a single participant.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      "was optimized in SimNIBS to focus the electric field as precisely as possible at the target ROI" It appears that some form of constrained optimization was used. It would be good to clarify which method was used, including a reference.

      Indeed, SimNIBS implements a constrained optimization approach based on pre-calculated lead fields. We have added the corresponding reference. All parameters used for the optimization are reported in the methods (see sub-section Electric field simulations and montage optimization). Regarding further specifics, the readers are invited to check the MATLAB code that was used for the optimization which is made available at: https://osf.io/3yutb

      "Thus, each montage has its pros and cons, and the choice of montage will depend on which of these dependent measures is prioritized." Well put. It would be interesting to know if authors considered optimizing for intensity on target. That would give the strongest predicted intensity on target, which seems like an important desideratum. Individualizing for something focal, as expected, did not give the strongest intensity. In fact, the method struggled to achieve the desired intensity of 0.1V/m in some subjects. It would be interesting to have a discussion about why this particular optimization method was selected.

      The specific optimization method used in this study was somewhat arbitrary, as there is no standard in the field. It was validated in prior studies, where it was also demonstrated that it performs favorably compared to alternative methods (Saturnino et al., 2019; Saturnino et al., 2021). The underlying physics of the head volume conductor generally limits the maximally achievable focality, and requires a tradeoff between focality and the desired intensity in the target. This tradeoff depends on the maximal amount of current that can be injected into the electrodes due to safety limits (4 mA in total in our case). Further constraints of the optimization in our application were the simultaneous targeting of two areas, and achieving field directions in the targets roughly parallel to those of auditory dipoles. Given the combination of these constraints, as the reviewer noticed, we could not even achieve the desired intensity of .1V/m in some subjects. As we wanted to stimulate both auditory cortices equally, our priority was to have the E-fields as similar as possible between hemispheres. Future studies optimizing for only one target would be easier to optimize for target intensity (assuming the same maximal total current injection). Alternatively, relaxing the constraint on direction and optimizing only for field intensity would help to increase the field intensities in the targets, but would lead to differing field directions in the two targets. As an example, see Rev. Fig.1 below. We extensively discuss some of these points in the discussion section: “Are individually optimized tACS montage better?” (Pages 21-22).

      Additionally, we added a few sentences in the Results and Methods giving more details about the optimization approach.

      Page 5, lines 115-116: “Using individual finite element method (FEM) head models (see Methods) and the lead field-based constrained optimization approach implemented in SimNIBS (31)”

      Page 27, lines 819-822: “The optimization pipeline employed the approach described in (31) and was performed in two steps. First, a lead field matrix was created per individual using the 10-10 EEG virtual cap provided in SimNIBS and performing electric field simulations based on the default tissue conductivities listed below.”

      Author response image 1.

      E-field distributions for one example participant. Brain maps show the results from the same optimization procedure described in the main manuscript but with no constraint for the current direction (top) or constraining the current direction (bottom). Note that the desired intensity of .1 V/m can be achieved when the current direction is not constrained.

      The terminology of "high-definition HD" used here is unconventional and may confuse some readers. The paper cited for ring electrodes (18) does not refer to it as HD. A quick search for high-definition HD yields mostly papers using many small electrodes, not ring electrodes. They look more like what was called "individualized". More conventional would be to call the first configuration a "ring-electrode", and the "individualized" configuration might be called "individualized HD".

      We thank the reviewer for this remark. We changed the label of the high-definition montage to ring-electrode. Regarding the individualized configuration, we prefer not to use individualized HD as it has the same number of electrodes as the standard montage.

      "So far, we have evaluated whether tACS at different phase lags interferes with stimulus-brain synchrony and modulates behavioral signatures of entrainment" The paper does not present any data on stimulus-brain synchrony. There is only an analysis of behavior and stimulus/tACS phase.

      We agree with the reviewer. To be more careful with such statement we now modified the sentence to say:

      Page 10, lines 303-304: “So far, we have evaluated whether tACS at different phase lags modulates behavioral signatures of entrainment: FM-amplitude and FM-phase.”

      "However, the strength of the tACS effect was variable across participants." and across sessions, and the phase also was variable across subjects and sessions.

      "tACS-amplitude estimates were averaged across sessions since the session did not significantly affect FM-amplitude (Fig. 5a)." More importantly, the authors show that "tACS-amplitude" was not reproducible across sessions.

      Unfortunately, we did not understand what the reviewer is suggesting here, and would have to ask the reviewer in this case to provide us with more information.

      References

      Kasten FH, Duecker K, Maack MC, Meiser A, Herrmann CS (2019) Integrating electric field modeling and neuroimaging to explain inter-individual variability of tACS effects. Nat Commun 10:5427. Riecke L, Sack AT, Schroeder CE (2015) Endogenous Delta/Theta Sound-Brain Phase Entrainment Accelerates the Buildup of Auditory Streaming. Curr Biol 25:3196-3201.

      Riecke L, Formisano E, Sorger B, Baskent D, Gaudrain E (2018) Neural Entrainment to Speech Modulates Speech Intelligibility. Curr Biol 28:161-169 e165.

      Saturnino GB, Madsen KH, Thielscher A (2021) Optimizing the electric field strength in multiple targets for multichannel transcranial electric stimulation. J Neural Eng 18.

      Saturnino GB, Siebner HR, Thielscher A, Madsen KH (2019) Accessibility of cortical regions to focal TES: Dependence on spatial position, safety, and practical constraints. Neuroimage 203:116183.

      Zoefel B, Davis MH, Valente G, Riecke L (2019) How to test for phasic modulation of neural and behavioural responses. Neuroimage 202:116175.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for your consideration and insightful comments on our article.

      We have gone through all the reviewers' comments and addressed all their questions and concerns point by point.

      As per their recommendation, we have amended our manuscript by providing more information about the experimental procedure and statistical analysis followed, and removed some analyses with a reduced number of imaging sessions. In addition, as a Resource and Tools article, the claim of our paper has been adjusted to a proof-of-concept paper showing robust and reliable preliminary results. In the meantime, we have provided 3 new Supplementary Figures, including one showing data from all individual animals.

      Reviewer #1 (Public Review):

      The authors apply a new approach to monitor brain-wide changes in sensory-evoked hemodynamic activity after focal stroke in fully conscious rats. Using functional ultrasound (fUS), they report immediate and lasting (up to 5 days) depression of sensory-evoked responses in somatosensory thalamic and cortical regions.

      Strengths: This a technically challenging and proof-of-concept study that employs new methods to study brain-wide changes in sensory-evoked neural activity, inferred from changes in cerebral blood flow. Despite the minor typos/grammatical errors and small sample size, the authors provide compelling images and rigorous analysis to support their conclusions. Overall, this was a very technically difficult study that was well executed. I believe that it will pave the way for more extensive studies using this methodological approach. Therefore I support this study and my recommendations to improve it are relatively minor in nature and should be simple for the authors to address.

      Weaknesses: The primary weakness of this paper is the small sample sizes. Drawing conclusions based on the small sham control group (n=2) or 5-day stroke recovery group (n=2), is rather tenuous. One way to alleviate some uncertainty with regard to the conclusions would be to state in the discussion that the findings (ie. loss of thalamocortical function after stroke) are perfectly consistent with previous studies that examined thalamocortical function after stroke. The authors missed some of these supporting studies in their reference list (see PMID: 28643802, 1400649). A second issue that can easily be resolved is their analysis of the 69 brain regions. This seems like a very important part of the study and one of the primary advantages of employing efUS. As presented, I had difficulty seeing the data. I think it would be worthwhile to expand Fig 3 (especially 3C) into a full-page figure with an accompanying table in the Supplementary info section describing the % change in CBF for each brain region.

      Other Recommendations for the authors:.

      • Since there is variability in spreading depolarizations, was there any trend in the relationship between # SD's and ischemic volume? I know there are few data points but a scatterplot might be of interest.

      • For statistical comparisons of 'response curves' in Fig 3 and 4, what exactly was the primary dependent measure: changes in peak amplitude (%) or area under the curve?

      • There are several typos and minor grammatical errors in the manuscript. Some editing is recommended.

      We thank the reviewer for the comments and suggestion, we have adapted our message to a proof-of-concept paper showing robust and reliable preliminary results. We also thank the reviewer for pointing out important references that support our observation and have added them to our article. We have provided a supplementary full-page version of the current Figure 3C (see Supplementary Figure 3).

      Regarding the recommendations, we strongly agree that it would be of interest to link SDs and ischaemia, but unfortunately this can't be done because our experimental design, i.e. narrow cranial window and single static plane, does not allow brain-wide quantification of ischemic volume. This would be possible either by scanning the brain or by using a matrix array (also discussed in the manuscript).

      For statistical analysis of the hemodynamic response curves, we have adapted them to compare the area under the curve (AUC). In addition, we have provided a new Supplementary Figure 4 showing the associated values and statistics.

      We have edited typos and errors.

      Reviewer #2 (Public Review):

      Brunner et al. present a new and promising application of functional ultrasound (fUS) imaging to follow the evolution of perfusion and haemodynamics upon thrombotic stroke in awake rats. The authors leveraged a chemically induced occlusion of the rat Medial Cerebral Artery (MCA) with ferric chloride in awake rats, while imaging with fUS cerebral perfusion with high spatio and temporal resolution (100µm x 110µm x 300µm x 0.8s). The authors also measured evoked haemodynamic response at different timepoints following whisker stimulation.

      As the fUS setup of the authors is limited to 2D imaging, Brunner and colleagues focused on a single coronal slice where they identified the primary Somatosensory Barrel Field of the Cortex (S1BF), directly perfused by the MCA and relay nuclei of the Thalamus: the Posterior (Po) and the Ventroposterior Medial (VPM) nuclei of the Thalamus. All these regions are involved in the sensory processing of whisker stimulation. By investigating these regions the authors present the hyper-acute effect of the stroke with these main results:

      • MCA occlusion results in a fast and important loss of perfusion in the ipsilesional cortex.

      • Thrombolysis is followed by Spreading Depolarisation measured in the Retrosplenial cortex.

      • Stroke-induced hypo-perfusion is associated with a significant drop in ipsilesional cortical response to whisker stimulation, and a milder one in ipsilesional subcortical relays.

      • Contralesional hemisphere is almost not affected by stroke with the exception of the cortex which presents a mildly reduced response to the stimulation.

      In addition, the authors demonstrate that their protocol allows to follow up stroke evolution up to five days post-induction. They further show that fUS can estimate the size of the infarcted volume with brilliance mode (B-mode), confirming the presence of the identified lesional tissue with post-mortem cresyl violet staining.

      Upon measuring functional response to whisker stimulation 5 days after stroke induction, the authors report that:

      • The ipsilesional cortex presents no response to the stimulation

      • The ipsilesional thalamic relays are less activated than hyper acutely

      • The contralesional cortex and subcortical regions are also less activated 5d after the stroke.

      These observations mainly validate the new method as a way to chronically image the longitudinal sequelae of stroke in awake animals. However, the potentially more intriguing results the authors describe in terms of functional reorganization of functional activity following stroke appear to be preliminary, and underpowered ( N = 5 animals were imaged to describe hyper-acute session, and N = 2 in a five day follow-up). While highly preliminary, the research model proposed by the author (where the loss of the infarcted cortex induces reduces activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive), is interesting. This hypothesis would require a greatly expanded, sufficiently powered study to be validated (or disproven).

      We thank the reviewer for the careful and accurate description of our work. We have addressed all the comments, recommendations and concerns raised by providing details of the experimental procedure and statistical analysis followed, and by removing some analyses associated with a reduced number of imaging sessions (at d5, n=2).

      Reviewer #3 (Public Review):

      The authors set out to demonstrate the utility of functional ultrasound for evaluating changes in brain hemodynamics elicited acutely and subacutely by the middle cerebral artery occlusion model of ischemic stroke in awake rats.

      Functional ultrasound affords a distinct set of tradeoffs relative to competing imaging modalities. Acclimatization of rats for awake imaging has proven difficult with most, and the high quality of presented data in awake rats is a major achievement. The major weakness of the approach is in its being restricted to single-slice acquisitions, which also complicates the registration of acquisition across multiple imaging sessions within the same animal. Establishing that awake imaging represents an advancement in relation to studies under anesthesia hinges upon the establishment of the level of stress experienced by the animals in the course of imaging, i.e., requires providing data on the assessment of stress over the course of these long imaging sessions. This is particularly significant given how significant a stressor physical restraint has been established to be in rodent models of stress. Furthermore, assessment of the robustness of these measurements is of particular significance for supporting the wide applicability of this approach to preclinical studies of brain injury: the individual animal data (effect sizes, activation areas, kinetics) should thus be displayed and the statistical analysis expanded. Both within-subject, within/across sessions, and across-subjects variability should be evaluated. Thoughtful comments on the relationship between power doppler signal and cerebral blood volume are important to include and facilitate comparisons to studies recording other blood volume-weighted signals. Finally, the contextualization of the observations with respect to other studies examining acute and subacute changes in brain hemodynamics post focal ischemic stroke in rats is needed. It is also quite helpful, for establishing the robustness of the approach, when the statistical parametric maps are shown in full (i.e. unmasked).

      We would like to thank the reviewer for the comments, recommendations and concerns he/she/they raised. We have addressed all the points to clarify our article and make it more relevant and informative for readers.

      Reviewer #2 (Recommendations For The Authors):

      The work described by Brunner et al is primarily a methodological paper, with potentially interesting, yet not robust enough, novel biological insight into the mechanisms of stroke. Nonetheless, the method employed is interesting and potentially well-validated.

      General comments/suggestions

      1- One potential concern I have is related to the relatively low sample size used, with n=5 for the main results and only n=2 for the follow-up after 5d. I am not sure much can be generalized using only two animals in any research study and this N = 2 dataset should probably be removed entirely from the study. Moreover, I found the statistical methods used were only superficially described, which prevented me from assessing whether the results reported by the authors are biologically relevant or not (including some significant differences in rCBV well below 1% estimated over two individuals).

      We fully agree with the reviewer’s comment and balanced our claim by considering this work as a proof-of-concept on brain imaging of multiple aspects of stroke hemodynamics (ischemia, spreading depolarization-like events, cortico-thalamic functions) in awake head-fixed rats. Therefore, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 356, 441, 455), we also remove statistics from the analysis at d5 post-stroke, see Figure 4 and associated paragraph from Line 356.

      2- Based on their investigations, the authors propose a model where the loss of infarcted cortex induces reduced activity in connected regions, whether by cortico-thalamic or cortico-cortical loss of excitatory drive. This is an intriguing framework but this hypothesis would require a more complete, well-powered study to be substantiated.

      I think a clear recognition of the fact that these findings are just preliminary and not validated should be more explicitly reported. I also marginally note here that these results are in contrast with previous reports from the same team where occlusion of the MCA induced increased response to whisker stimulation in anaesthetised rats. These contradictory findings are not discussed in this manuscript.

      As mentioned above, we explicit more on the proof-of-concept proposed in this work as well as clearly stating on the preliminary aspect of the findings described in this work. As mentioned above, we attenuated our message along the entire manuscript to prevent misunderstanding and over statement (e.g., Lines 348, 433, 447), we also remove statistics from the analysis at d5 post-stroke, see figure 4 and associated paragraph from Line 348.

      We thanks the reviewer for pointing out the missing link with our previous work performed under anesthesia. We therefore provided a discussion point on this contradictory finding (Line 441).

      3- In a previous study from the same group perfusion was imaged in 3D either by means of a motorized probe or by using a 2D matrix arrays. It would be interesting to discuss why a 2D approach was chosen in this study over those previous methods.

      Indeed, brain-wide coverage would be of great interest in such experiment context. As mentionned by the reviewer, two strategies can be used:

      • One can scan the brain using a motorized probe as performed for different purposes by Sieu et al., Nature Methods, 2015; Hingot, Brodin et al., Theranostics 2020; Macé et al., Neuron 2019 and also by our group in Sans-Dublanc, Chrzanowska et al., Neuron, 2022; Brunner et al. Frontiers in Neuroscience 2022 and Brunner et al., JCBFM 2023. (This list of publication is not exhaustive).

      • A second approach aims at using a 2D matrix array to capture functions at brain-wide scale. So far, this strategy has been employed in a couple of studies (Rabut et al., Nature Methods, 2019 and Brunner, Grillet et al., Neuron, 2020).

      The strategy consisting of scanning (manually or using a motor) strongly limits investigation on brain functions, as performing an accurate covering of the functional regions requires an extensive and time-consumming scanning: brain functions must be addressed several time to capture a reliable and robust signal for all the brain section scanned (see Brunner et al., 2022). Unfortunately, this strategy prevents us to accurately capture other brain hemodynamics like the dynamic of the ischemia or the spreading depolarization event.

      On the other hand, the volumetric functional ultrasound imaging (vfUSI) would be suited for brain-wide coverage capturing large-scale brain functions (see Brunner, Grillet et al. Neuron 2020) and hemodynamic events (see Rabut et al., Nature Methods, 2019) but at the cost of the resolution, frame rate and larger cranial window. Unfortunately, this technology was not available when this work was conducted.

      Such experimental opportunities have been suggested at the end of the manuscript: “To overcome such limitation, one can extend the size of the cranial window to allow for larger scale imaging either by sequentially scanning the brain27,28,31,32,59,69,71,72, or by using the recently developed volumetric fUS which provides whole-brain imaging capabilities in anesthetized73 and awake rats30.“

      4- Overall the registration scheme seems suboptimal which ultimately questions the specificity of the findings in thalamic regions. It would be interesting to validate this procedure, especially the probe repositioning five days after the stroke.

      Positioning was not difficult part of this experiment. First, all head posts were implanted in the same position relative to the skull references bregma and lambda. Second, the head fixation ensures the same placement of the headpost for all animals. Finally, fine adjustement of the ultrasound probe position were done using a micromanipulator by finding key landmarks from the µDoppler image. In practice, minimal adjustements were needed to find back the same imaging plane. We provide additional information about the positionning in the Materials and Methods section.

      New text – Line 126: “Positionning.

      The mechanical fixation of the head-post ensures an easy and repeatabe positionning of the ultrasound probe across imaging session. The ultrasound probe is indeed fixed to a micromanipulator enabling light adjustements To find the plane of interest (containing both S1BF and thalamic relays: bregma - 3.4mm), we used brain landmarks (e.g., surface of the brain, hippocampus, superior sagittal sinus, large vessels). Note that as the headpost was carefully placed in the same position relative to the skulls landmarks (bregma and lambda), the position of the region of interest was minimal across animals.”

      Second, at d5 post-stroke, we positionned the ultrasound probe over the imaging window as described in the Materials and Methods section and use brain landmarks from baseline/post-stroke image to maximize the position of brain image. We better detail the procedure followed.

      Original text: “First, we used the vascular markers and the shape of the hippocampus31,32 to find back the coronal cross-section imaged during the pre-stroke session. Five days after the MCA occlusion,….”

      New text – Line 360 :“Five days after the MCA occlusion, we first placed the ultrasound probe over the imaging window and adjusted its position (using micromanipulator) to find back the recording plane from Pre-Stroke session using Bmode (morphological mode) and µDoppler imaging using brain vascular landmarks (i.e., vascular patterns, brain surface and hippocampus34,35; see Figure 2B).”

      More detailed questions/comments/suggestions

      Methods

      ARRIVE methodology

      • Point 2b: sample size is not adequately explained, especially the use of n = 2 animals for 5d follow up

      We have explicited the sample size by adding a short paragraph at the beginning of the Results section. We also make the Supplementary Table 1 more accurate. New text – Line 239: “Animals

      Report on animal use, experimentation, exclusion criteria can be found in Supplementary Table 1. Rat#1 was excluded after the control session as the imaging window was too anterior to capture both cortical and thalamic responses. Ra#2 was excluded as hemodynamic responses were inconsistent during baseline (pre-stroke) period. Rat#3 showed early post-stroke reperfusion and was excluded from stroke analysis, the control session (pre-stroke) from Rat#3 was analyzed.”

      • Point 7: statistical methods: The quantification used to assess significant differences in stimulation traces is poorly described.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221: “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      Functional Ultrasound Imaging acquisition

      • References 26 and 28 imply 2.5Hz and 2Hz acquisition rates, respectively. Why does the same method result in a 1.25Hz acquisition rate here? Can you confirm the same spatial resolution in these conditions?

      The spatial resolution is independent of the temporal resolution (frame rate). The spatial resolution depends on the resolution of the compound image and the temporal resolution is given by the number of compound images to generate a single Doppler image (exposure time). By increasing the number of compound images, the frame rate decreases while increasing the signal to noise ratio and sensistivity. For some work, a pause between 2 frames is used (mostly due to technical limitations in the software (processing time , or execution of a real-time display/processing by the user), however this reduces the frame rate.

      Author response table 1.

      Comparing with the sequences used in references 26 and 28, we have the following timing parameters

      In this work, we decided to reduce the frame rate to have less images but with higher SNR. The 0.3s were added by technical considerations in this specific implementation.

      New text – Line 158:“ To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. “

      Activity Maps

      • How is the use of a 40s window motivated?

      The 40s window has been choosen to better compare hemodynamic responses to either left or right whisker stimulation and centered the period of interest on the start of the stimulation. Original text:” Pre- and post-stroke recordings are reshaped in shorter 40-s sessions, i.e., 50 frames, …”

      New text – Line 206:“ Pre- and post-stroke recordings are reshaped in 40-s sessions, i.e., 50 frames, centered on the start of the stimulation (at 20s), …”

      • I think the manuscript would benefit from the use of an established, event-based GLM for activity mapping.

      We thank the reviewer for this suggestion, here we used a z-score for activity mapping that is largerly established in the neuroimaging realm.

      • The statistical thresholds used should account for multiple comparisons.

      We have amended the Materials and Methods section, and figure captions about statistics and provided Supplementary Figure 4.

      Statistical analyses

      • Overall this section is only superficially described, and lacks detailed information.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      New text – Line 221 : “Activated brain regions were detected from hemodynamic response time-courses using GLM followed by t-test across animals as proposed in Brunner, Grillet et al.,34. The area under the curve (AUC) from hemodynamic response time-courses was computed for individual trials in S1BF, VPM and Po regions, for all the periods of the recording and for all rats included in this work. AUC were compared and analysed using a non-parametric Kruskal-Wallis test corrected for multiple comparison using a Dunn’s test. Tests were performed using GraphPad Prism 10.0.1. “

      • Are average rCBV changes referred to in the 40s window?

      The rCBV changes are referring to the pre-stimulation baseline. We have modified the text accordingly (Line 206).

      • Were normality and variance equality requirements verified in the group with n=2?

      Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • There is no method for cresyl violet staining

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure – Line 228:

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope.”

      Results 1: Real time imaging of stroke induction in awake rats

      • Why is the window so narrow in the anteroposterior direction?

      The imaging window was defined based on the brain regions investigated in this work, meaning the primary somatosensory cortex (S1BF) and the ventroposterior medial thalamic relay (VPM). From Paxinos atlas, a position of interest is located at Bregma -3.4mm. The cranial window was performed accordingly, and restricted couple of mm to avoid non-needed procedure and brain exposure. We added a new sentence in the Materials & Methods section – Line 116: “This cranial window aims to cover bilateral thalamo-cortical circuits of the somatosensory whisker-to-barrel pathway.”

      • What validation was employed for the habituation protocol? Are animals stressed by the procedure? Do you have cortisol data to show? Ar animal weights throughout the procedure?

      The habituation protocol employed in this work follows recommandations from the expert in the field and peers (Martin et al., Journal of Neuroscience Methods, 2002; Martin et al., Neuroimage 2006; Topchiy et al., Behav Brain Res 2009). We have amended the corresponding paragraph in the Materials & Methods section detailling the habituation procedure:

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      • The observation of contralateral oligemia is based only on RSG traces.

      We provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • The spatial and temporal distribution of Bmode measured hyperechogenicity is surprising and should be discussed. Reference 29 describes for instance non-overlap with an area of hypo-perfusion. Overlap between hypo-perfused and infarct volumes should be systematically investigated and coregistered with histology. Moreover, reference 40, while using a different model, presents hyperechogenicity at 5h.

      The B-mode images in Figure 2B are presented as an illustration of the potential morphological changes detected at different timepoint. However, our study focuses on functional responses and not on the evolution of the morphological changes. Indeed, this Bmode images remain difficult to interpret as they show a structural reorganization at the level of the ultrasound scatterers which has not been directly linked with tissue infarction, oedema, orother histological conditions.

      Regarding the reference 40, the authors found an hyper-echogenicity at 5h a time window is not covered by our protocol. In reference 29, we indeed detailed a mismatch between the µDoppler images and histopathology. As suggested by the reviewer, seeking for other potential mismatchs/overlaps between Bmode/µDoppler and histopathology is an interesting field on investigation, but remains out of the scope of this work.

      Results 3: Delayed alteration of the somatosensory thalamocortical pathway

      • These results are underpowered and as such should probably be removed entirely from the paper (or substantiated with greater Ns of animals). Based on reviewers comment’s on the limited amount of recording at 5d, we have decided to remove this statistical analysis. The manuscript, figure and caption were corrected accordingly.

      • If I am not mistaken, reference 28 describes a protocol for awake mouse imaging, and thereby does not introduce any hippocampal landmark allowing effective positioning of the probe.

      We thanks the reviewer for this comment. While not used in the figure detailling image registration in reference 28, step 42 (page 17) from the protocol mentions the use of hippocampal landmark to position of the imaged brain to the atlas. The hippocampal landmark is also used in Brunner et al., JCBFM 2023, we have added this reference which is more appropriate to this work (i.e., rat model, digitalized paxinos atlas, linear ultrasound transducer).

      • Significant difference in ispsilesional VPM with post-stroke period looks spurious.

      We have amended the Materials and Methods section about statistics and provided Supplementary Figure 4.

      Discussion:

      The sentence "might result from the direct loss of the excitatory corticothalamic feedback to the VPM" should be moderated in the absence of electrophysiology support. Such a decrease could be explained by reduced perfusion due to the challenge.

      The reviewer is right and we believe the tense used in the sentence already balance the claim. However, we clarified on how such result could be better validated.

      Original text: “Further work will need to dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level, as fUS only reports on hemodynamics as a proxy of local neuronal activity27,28,60,66–68“

      New text – Line 445: “Therefore, further studies will be needed to accurately dissect the complex and long-lasting post-stroke alterations of the functional whisker-to-barrel pathway, including at the neuronal level by direct electrophysiology recordings and imaging, as fUS only reports on hemodynamics as a proxy of local neuronal activity30,31,63,74–76.“

      Figure 2

      • Panel B would be more informative if presented as an average.

      The aim of this figure is to show the raw data of a typical case. Averaging µDoppler images wouldn’t be illustrative as individual vessels will not be visible anymore. Because the vessels are in different positions from one animal to another, an average image would be blurred.

      • Panel C lacks contralateral S1BF trace.

      We have provided contralesional perfusion changes for all regions in Supplementary Figure 1.

      • Methods for detection of SDs refer to non-peer-reviewed reference 29, where SD is defined as 50% over baseline level. What is the actual threshold/method used to define a SD in this study?

      We better detailled this procedure in the Materials & Methods section - Line 195: “The detection of hemodynamic events associated with spreading depolarizations (SDs) was performed based on the temporal analysis of the rCBV signal in the retrosplenial granular (RSGc) and dysgranular (RSD) cortices of the left hemisphere (ipsi-lesional). SDs were defined as transient increase of rCBV signal (+25%) detected with a temporal delay of <10 frames (i.e., 8secs) between the two regions of interest, validating both the hyperemia and spreading features of hemodynamic events associated with spreading depolarizations.”

      • For panel F, a measure of variance would be more suited to show stereotypic profile across animals as the number of SDs varies between animals.

      Figure 2F indeed shows the average profile of hemodynamic events associated with spreading depolarizations (black line) with the variance (95% confidence interval error bands in gray). We have adjusted the corresponding figure caption to make this information more clear.

      Figure 3

      • The exact stimulation employed is not clear as the methods describe a 1.33 min delay between two whisker pad stimulations, but the figure reports 40s. The description is thereby ambiguous. We thank the reviewer for pointing out this potiential confusion which allowed us to correct a mistake

      • The effective delay between two stimulations delivered to the whisker pads is 40 seconds

      • The effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start or 75 seconds from end to start.

      The text was amended accordingly in line 144: “Thus, the effective delay between two stimulations delivered to the same whisker pad is 80 seconds from start to start.“

      • In panel B the choice of colormap and transparency for template overlay is not explained and is confusing given the employed threshold of 1.6. Which mask was used to overlay the activation map on the template? Why black color to represent a supposedly significant difference?

      We thank the reviewer for pointing out this potiential confusion. We have adjusted the colormap in Figures 3 and 4.

      • The pre-stroke thalamic response is clearly localized in VPM for left stimulation, while it overlaps VPM and Po for the right stimulation. This questions the accuracy of the employed registration scheme and consequently the choice of these ROIs, which appear quite small as compared to the resolution and this positioning precision.

      We see the point of the reviewer, here the apparent difference because the brain is slighly tilted. By adjusting the angle for both activity maps (see Author response image 1) we confirm that both maps are very similar including the for activated areas VPM and Po.

      Author response image 1.

      • It would be interesting to see the same activation maps for all animals in supplementary.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • Looking at panel C, more cortical regions seem to respond to the stimulation above S1BF.

      The reviewer is right and we have indeed mentioned this point several times in the original manuscript in:

      • the result section: “We also detected significant increase of activity in S2, AuD, Ect (*p<0.0001) and PRh (p<0.001) cortices and VPL nucleus (**p<0.01; the list of acronyms is provided in Supplementary Table 2), brain regions receiving direct efferent projections from the S1BF45,48,49, VPM or Po nuclei50–52.”

      • the caption of Figure 4: “S1BF, S2, AuD, VPM, VPL and Po regions are brain regions significatively activated (all pvalue<0.01; GLM followed by t-test.”

      • the conclusion section : “Functional responses to mechanical whisker stimulation were detected in several regions relaying the information from the whisker to the cortex, including the VPM and Po nuclei of the thalamus, and S1BF, the somatosensory barrel-field cortex. Responses were also observed in the S2 cortex involved in the multisensory integration of the information43,44,61, the auditory cortex as it receives direct efferent projection from S1BF45,61, and the VPL nuclei of the thalamus connected via corticothalamic projections45.“

      • It would be interesting to see bilateral traces as supplementary figures.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work.

      • In both panels C and D, n=5 is reported, but methods state the use of 7 animals. Please clarify how animals have been used in the different studies

      We have clarified the report on animal use and amended the Supplementary Table 1 accordingly.

      • In Panel D, the 95% CI intervals seem particularly narrow. Might this be the result of considering multiple trials as independent events? A GLM analysis would avoid this statistical fallacy.

      We have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and all rats included in this work. The statistical analysis has been adjusted (see Materials and Methods) and completed with a Supplementary Figure 4

      Figure 4 - See comments above for Figure 3

      We have adjusted the Figure 3 accordingly to reviewer’s suggestions

      Reviewer #3 (Recommendations For The Authors):

      1) Introduction: Given the emphasis on the awake state, it would be helpful to note that a significant portion of strokes occur during sleep - as well as comment on its hemodynamic difference with respect to an awake state.

      We agree with the reviewer on the remark that some strokes occur during sleep phase. However, here the awake state, which has been poorly addressed in the litterature, is opposed to anesthesia a condition largerly used to investigate brain functions after stroke. We added a point and corresponding references about wake-up stroke, see Line 49.

      2) The effects of anesthetics on stroke are quite variable and the literature data on the topic is rather divergent: it would be helpful for the introduction to reflect the large level of discord in the literature and the wide-ranging mechanisms of action of different anesthetics.

      We thank the reviewer for this comment. We have completed our original sentence in the introduction to better reflect the various effects of anesthetics on stroke, see Line 50

      3) The reference list (14-17) to other studies of brain hemodynamic changes post ischemic stroke is egregiously short. Please expand. Similarly, the list of citations to other functional ultrasound rodent studies in the literature (23-24) is misleading: other groups have published similar work and ought to be cited.

      We thank the reviewer for this comment and added complementary references. However, we believe that the references 14-17 pointed by the reviewer are not only refering to brain hemodynamic changes but mostly on network and function as stated in the manuscript. Regarding references on fUS (23-24) mentioned by the reviewer, we did not limited our citation on functional ultrasound imaging to those 2 articles but on 15+ from 4 different research groups.

      4) It would be helpful if the authors used "spreading depolarization" the way it has been utilized in the many decades of research on them in the literature, namely, as waves of hyper/hypoactivity in the electrophysiological signals. Please use a distinct term to refer to waves of changes in the hemodynamic state.

      We have amended the terminology used in the manuscript. “Spreading depolarization” has been replaced by “hemodynamic events associated with spreading depolarizations” or similar.

      5) Why is this investigation restricted to male rats?

      As a proof of concept, we did not performed experiments in female rats. We agree that further investigation would require a gender mix. We added a line in the discussion.

      New text – Line 455:” Finally, it is important to note that this proof-of-concept work did not specifically focus the impact of sex dimorphism on the stroke or early behavioral outcomes following the insult that would greatly enhance the translational value of such preclinical stroke study80.”

      6) Were the animals tested during their active phase? If not, why not, and what are the implications of testing their responses during the sleep phase?

      We think there is a misunderstanding here as we investigated brain functions in awake head-fixed rats. Therefore, the sleep/active phases were not investigated neither mentioned in the manuscript.

      7) How is the level of stress monitored/established?

      In this work, we followed established procedure used to reduce stress and disconfort of the rats all along the experiment. The procedure used is now better detailled in the Materials and Methods section. However, the level of stress was not monitored, and would be of interest to considere in future experiments.

      8) What are the sequelae of stress on brain hemodynamics, especially given 1-4 hour long sessions.

      This is a good remark. While we cannot state on how the stress impacts brain hemodynamics, the data extracted show that hemodynamics reponse functions were stable and robust over hour-long recording (see control and pre-stroke sessions in Supplementary Figure 5).

      9) How is the animal prepared for stroke induction? In general, the methodological steps surrounding animal handling and preparation are exceedingly terse.

      We provided more details about the handling and preparation of the rats in the Materials and Methods section.

      Original text: “Body restraint and head fixation.

      Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada), progressively increasing the restraining period from minutes to hours33,34. After the headpost implantation (see below), rats were habituated to be head-fixed while restrained in the sling. The period of fixation was progressively increased from minutes to hours. Water and food gel (DietGel, ClearH2O, USA) were provided along the habituation session. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      New text - Line 90:“ Body restraint and head fixation.

      The body restraint and head fixation procedures are adapted from published protocols and setup dedicated for brain imaging of awake rats39–41. Rats were habituated to the workbench and to be restrained in a sling suit (Lomir Biomedical inc, Canada) by progressively increasing restraining periods from minutes (5mins, 10mins, 30mins) to hours (1 and 3hrs) for one or two weeks. The habituation to head-fixation started by short (5 to 30s) and gentle head-fixation of the headpost between fingers. The headpost was then secured between clamps for fixation periods progressively increased following the same procedure as with the sling. For both body restraint and head fixation, the initial struggling and vocalization diminished over sessions. Water and food gel (DietGel, ClearH2O, USA) were provided for all body restraint and head-fixation habituation sessions. Once habituated, the cranial window for imaging was performed as described below (Figure 1A-C).”

      10) What is the reproducibility of the chemo-thrombotic model timeline? What are its limitations?

      We have provided more information on the chemo-thrombotic model and its limitations in the discussion section to discuss

      New text – Line 402:” However, to adequatly and efficiently occlude the vessel of interest, removing a piece of skull remains required. As mentioned in the report on animal use, one rat was excluded from the analysis as the MCA spontaneously reperfuses, thus dropping the success rate of such model.”

      11) What is the motivation behind the 5-days post stroke timepoint selection?

      In addition to demonstrating the feasability of imaging brain functions at different timepoint following the ischemia, the motivation to performed this delayed session was to capture functional diaschisis which is known to occur few days after the initial insult. More recurrent imaging sessions covering a longer post-stroke period would be of high interest to better capture the impact of ischemia on both the brain hemodynamics and functions.

      12) How predictive is hyperacute hemodynamics imaging of the long-term outcome?

      We thanks the reviewer for this question, that remains of major interest in the stroke realm. However, the prediction of long-term outcome would require to capture brain hemodynamic at larger scale as performed in Hingot et al., Theranostics 2020 and Brunner et al. JCBFM 2023, a coverage not accessible with the imaging window proposed in this work.

      13) It would be greatly reassuring if the authors presented the statistical parametric maps without masking regions of interest (eg Fig3B).

      We thank the reviewer for pointing out this potential confusion. In the first version of the figure, the colormap used of activity maps was indeed non optimal. Therefore, we i) adjusted the colormap used in Fig 3 and 4 and ii) provided non-thresholded z-score maps for all rats in Supplementary Figure 5.

      14) Fig 3C is hard to make out.

      We provided a full page version of the Figure 3C in Supplementary Figure 3.

      15) Figs 3,4 should incorporate box and whisker plots of data across all rats scatter plots of individual animal data.

      We are not sure which kind of data the reviewer wants to have displayed here. However, we have provided the Supplementary Figure 5 that contains both ipsilateral and contralateral responses to whiskers stimulation (from both left and right pads) for all trials and for individual animal included in this work.

      16) The final panels in Figures 3,4 would more tellingly include the plots of the linear models fitted.

      Based on all reviewers’ comments, we have adjusted and clarified the statistical analysis performed (see Materials and Method) and completed with a Supplementary Figure 4.

      17) The frame rate calculations are not adding up unless averaging and pauses are included so some more details should be stated. Are tilted plane waves averaged before compounding as in prior publications?

      The angles are averaged 6 times before compounding to reduce signal to noise ration and there is a pause of 0.3s between each Doppler image. See also question “Functional Ultrasound Imaging acquisition” from reviewer 2. We also provided supplementary and key information about the sequence used in this work.

      We have provided complementary information in the manuscript:

      Original text:” The ultrasound sequence generated by the software is the same as in Macé et al.,26 and Brunner, Grillet et al., Briefly, the ultrafast scanner images the brain 140 with 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°) at a 10-kHz frame rate. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. Each set of 250 compound images is 142 filtered to extract the blood signal. Finally, the intensity of the filtered images is averaged to obtain a 143 vascular image of the rat brain at a frame rate of 1.25Hz. Then, the acquired images are processed with a dedicated GPU architecture, displayed in real-time for data visualization, and stored for subsequent off-line analysis.”

      New text – Line 146:” The ultrasound sequence generated by the software is adapted from Macé et al.31 and Brunner, Grillet et al.34 Ultrafast images of the brain were generated using 5 tilted plane-waves (-6°, -3°, +0.5°, +3°, +6°). Each plane wave is repeated 6 times and the recorded echoes are averaged to increase the signal to noise ration. The 5 plane-wave images are added to create compound images at a frame rate of 500Hz. To obtain a single vascular image we acquired a set of 250 compound images in 0.5s, an extra 0.3s pause is included between each image to have some processing time to display the images for real-time monitoring of the experiment. The set of 250 compound images has a mixed information of blood and tissue signal. To extract the blood signal we apply a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter aims to select all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal i.e., Power Doppler image. This image is in first approximation proportional to the cerebral blood volume26,28. Overall, this process enables a continious acquisition of power Doppler images at a frame rate of 1.25Hz during several hours.”

      18) Ultrasound data processing: The filtering process should have more description. It would be highly instructive to explain that the power Doppler signal is being used and comment clearly on its relationship to blood volume, commenting on stalled flow mircrovessels/RBC-devoid micrrovessels, and considerations of vessel orientation.

      The compound image has a mixed information of blood and tissu signal. To extract the blood signal, we applied a low pass filter (cutt off 15Hz) and an SVD filter that eliminates 20 singular values. This filter selects all the signal from blood moving with an axial velocity higher than ~1mm/s. To obtain a vascular iimage we compute the intensity of the blood signal (Power Doppler image). This power Doppler image is in first approximation proportional to the cerebral blood volume.

      These information have been added in the Materials and Methods section of the manuscript.

      19) Does the SVD processing have the same cut off (20 singular values) as in prior publications as a standard value, or is that adjusted for each study? There are enough minor differences between sequences that these details are uncertain. Do the overall hemodynamics measurements (Fig 2) include all data acquired, or do they exclude the whisker stimulation events, and if so, how long of a window is excluded? The explanation of the activity maps should be rephrased e.g. "... recordings are segmented in shorter 40-s time windows encompassing the whisker stimulation trials..."

      We agree that these details are important, all these information have been added to the manuscript

      • SVD processing: We eliminate 20 singular values as in cited studies.

      • Sequence: we have included more details about the sequence.

      • Processing: all data during the whisker stimulation is used.

      • We have rephrased the explanation about the activity maps.

      20) Discuss the methodology behind histological data shown in Fig. 1.

      We thank the review for highlighting this omission. We have provided a paragraph in the Materials & Methods section detailling the histology procedure (Line 228):

      “Histopathology

      Rats were killed 24hrs after the occlusion for histological analysis of the infarcted tissue. Rats received a lethal injection of pentobarbital (100mg/kg i.p. Dolethal, Vetoquinol, France). Using a peristaltic pump, they were transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde (Sigma-Aldrich, USA). Brains were collected and post-fixed overnight. 50-μm thick coronal brain sections across the MCA territory were sliced on a vibratome (VT1000S, Leica Microsystems, Germany) and analyzed using the cresyl violet (Electron Microscopy Sciences, USA) staining procedure (see Open Lab Book for procedure). Slices were mounted with DPX mounting medium (Sigma-Aldrich, USA) and scanned using a bright-field microscope

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations

      Recommendation #1: Address potential confounds in the experimental design:

      (1a) Confounding factors between baseline to early learning. While the visual display of the curved line remains constant, there are at least three changes between these two phases: 1) the presence of reward feedback (the focus of the paper); 2) a perturbation introduced to draw a hidden, mirror-symmetric curved line; 3) instructions provided to use reward feedback to trace the line on the screen (intentionally deceitful). As such, it remains unclear which of these factors are driving the changes in both behavior and bold signals between the two phases. The absence of a veridical feedback phase in which participants received reward feedback associated with the shown trajectory seems like a major limitation.

      (1b) Confounding Factors Between Early and Late Learning. While the authors have focused on interpreting changes from early to late due to the explore-exploit trade-off, there are three additional factors possibly at play: 1) increasing fatigue, 2) withdrawal of attention, specifically related to individuals who have either successfully learned the perturbation within the first few trials or those who have simply given up, or 3) increasing awareness of the perturbation (not clear if subjective reports about perturbation awareness were measured.). I understand that fMRI research is resource-intensive; however, it is not clear how to rule out these alternatives with their existing data without additional control groups. [Another reviewer added the following: Why did the authors not acquire data during a control condition? How can we be confident that the neural dynamics observed are not due to the simple passage of time? Or if these effects are due to the task, what drives them? The reward component, the movement execution, increased automaticity?]

      We have opted to address both of these points above within a single reply, as together they suggest potential confounding factors across the three phases of the task. We would agree that, if the results of our pairwise comparisons (e.g., Early > Baseline or Late > Early) were considered in isolation from one another, then these critiques of the study would be problematic. However, when considering the pattern of effects across the three task phases, we believe most of these critiques can be dismissed. Below, we first describe our results in this context, and then discuss how they address the reviewers’ various critiques.

      Recall that from Baseline to Early learning, we observe an expansion of several cortical areas (e.g., core regions in the DMN) along the manifold (red areas in Fig. 4A, see manifold shifts in Fig. 4C) that subsequently exhibit contraction during Early to Late learning (blue areas in Fig. 4B, see manifold shifts in Fig. 4D). We show this overlap in brain areas in Author response image 1 below, panel A. Notably, several of these brain areas appear to contract back to their original, Baseline locations along the manifold during Late learning (compare Fig. 4C and D). This is evidenced by the fact that many of these same regions (e.g., DMN regions, in Author response image 1 panel A below) fail to show a significant difference between the Baseline and Late learning epochs (see Author response image 1 panel B below, which is taken from supplementary Fig 6). That is, the regions that show significant expansion and subsequent contraction (in Author response image 1 panel A below) tend not to overlap with the regions that significantly changed over the time course of the task (in Author response image 1 panel B below).

      Author response image 1.

      Note that this basic observation above is not only true of our regional manifold eccentricity data, but also in the underlying functional connectivity data associated with individual brain regions. To make this second point clearer, we have modified and annotated our Fig. 5 and included it below. Note the reversal in seed-based functional connectivity from Baseline to Early learning (leftmost brain plots) compared to Early to Late learning (rightmost brain plots). That is, it is generally the case that for each seed-region (A-C) the areas that increase in seed-connectivity with the seed region (in red; leftmost plot) are also the areas that decrease in seed-connectivity with the seed region (in blue; rightmost plot), and vice versa. [Also note that these connectivity reversals are conveyed through the eccentricity data — the horizontal red line in the rightmost plots denote the mean eccentricity of these brain regions during the Baseline phase, helping to highlight the fact that the eccentricity of the Late learning phase reverses back towards this Baseline level].

      Author response image 2.

      Critically, these reversals in brain connectivity noted above directly counter several of the critiques noted by the reviewers. For instance, this reversal pattern of effects argues against the idea that our results during Early Learning can be simply explained due to the (i) presence of reward feedback, (ii) presence of the perturbation or (iii) instructions to use reward feedback to trace the path on the screen. Indeed, all of these factors are also present during Late learning, and yet many of the patterns of brain activity during this time period revert back to the Baseline patterns of connectivity, where these factors are absent. Similarly, this reversal pattern strongly refutes the idea that the effects are simply due to the passage of time, increasing fatigue, or general awareness of the perturbation. Indeed, if any of these factors alone could explain the data, then we would have expected a gradual increase (or decrease) in eccentricity and connectivity from Baseline to Early to Late learning, which we do not observe. We believe these are all important points when interpreting the data, but which we failed to mention in our original manuscript when discussing our findings.

      We have now rectified this in the revised paper, where we now write in our Discussion:

      “Finally, it is important to note that the reversal pattern of effects noted above suggests that our findings during learning cannot be simply attributed to the introduction of reward feedback and/or the perturbation during Early learning, as both of these task-related features are also present during Late learning. In addition, these results cannot be simply explained due to the passage of time or increasing subject fatigue, as this would predict a consistent directional change in eccentricity across the Baseline, Early and Late learning epochs.”

      However, having said the above, we acknowledge that one potential factor that our findings cannot exclude is that they are (at least partially) attributable to changes in subjects’ state of attention throughout the task. Indeed, one can certainly argue that Baseline trials in our study don’t require a great deal of attention (after all, subjects are simply tracing a curved path presented on the screen). Likewise, for subjects that have learned the hidden shape, the Late learning trials are also likely to require limited attentional resources (indeed, many subjects at this point are simply producing the same shape trial after trial). Consequently, the large shift in brain connectivity that we observe from Baseline to Early Learning, and the subsequent reversion back to Baseline-levels of connectivity during Late learning, could actually reflect a heightened allocation of attention as subjects are attempting to learn the (hidden) rewarded shape. However, we do not believe that this would reflect a ‘confound’ of our study per se — indeed, any subject who has participated in a motor learning study would agree that the early learning phase of a task is far more cognitively demanding than Baseline trials and Late learning trials. As such, it is difficult to disentangle this ‘attention’ factor from the learning process itself (and in fact, it is likely central to it).

      Of course, one could have designed a ‘control’ task in which subjects must direct their attention to something other than the learning task itself (e.g., divided attention paradigm, e.g., Taylor & Thoroughman, 2007, 2008, and/or perform a secondary task concurrently (Codol et al., 2018; Holland et al., 2018), but we know that this type of manipulation impairs the learning process itself. Thus, in such a case, it wouldn’t be obvious to the experimenter what they are actually measuring in brain activity during such a task. And, to extend this argument even further, it is true that any sort of brain-based modulation can be argued to reflect some ‘attentional’ process, rather than modulations related to the specific task-based process under consideration (in our case, motor learning). In this regard, we are sympathetic to the views of Richard Andersen and colleagues who have eloquently stated that “The study of how attention interacts with other neural processing systems is a most important endeavor. However, we think that over-generalizing attention to encompass a large variety of different neural processes weakens the concept and undercuts the ability to develop a robust understanding of other cognitive functions.” (Andersen & Cui, 2007, Neuron). In short, it appears that different fields/researchers have alternate views on the usefulness of attention as an explanatory construct (see also articles from Hommel et al., 2019, “No one knows what attention is”, and Wu, 2023, “We know what attention is!”), and we personally don’t have a dog in this fight. We only highlight these issues to draw attention (no pun intended) that it is not trivial to separate these different neural processes during a motor learning study.

      Nevertheless, we do believe these are important points worth flagging for the reader in our paper, as they might have similar questions. To this end, we have now included in our Discussion section the following text:

      “It is also possible that some of these task-related shifts in connectivity relate to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      Finally, we should note that, at the end of testing, we did not assess participants' awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path). In hindsight, this would have been a good idea and provided some value to the current project. Nevertheless, it seems clear that, based on several of the learning profiles observed (e.g., subjects who exhibited very rapid learning during the Early Learning phase, more on this below), that many individuals became aware of a shape approximating the rewarded path. Note that we have included new figures (see our responses below) that give a better example of what fast versus slower learning looks like. In addition, we now note in our Methods that we did not probe participants about their subjective awareness re: the perturbation:

      “Note that, at the end of testing, we did not assess participants’ awareness of the manipulation (i.e., that they were, in fact, being rewarded based on a mirror image path of the visible path).”

      Recommendation #2: Provide more behavioral quantification.

      (2a) The authors chose to only plot the average learning score in Figure 1D, without an indication of movement variability. I think this is quite important, to give the reader an impression of how variable the movements were at baseline, during early learning, and over the course of learning. There is evidence that baseline variability influences the 'detectability' of imposed rotations (in the case of adaptation learning), which could be relevant here. Shading the plots by movement variability would also be important to see if there was some refinement of the moment after participants performed at the ceiling (which seems to be the case ~ after trial 150). This is especially worrying given that in Fig 6A there is a clear indication that there is a large difference between subjects' solutions on the task. One subject exhibits almost a one-shot learning curve (reaching a score of 75 after one or two trials), whereas others don't seem to really learn until the near end. What does this between-subject variability mean for the authors' hypothesized neural processes?

      In line with these recommendations, we have now provided much better behavioral quantification of subject-level performance in both the main manuscript and supplementary material. For instance, in a new supplemental Figure 1 (shown below), we now include mean subject (+/- SE) reaction times (RTs), movement times (MTs) and movement path variability (our computing of these measures are now defined in our Methods section).

      As can be seen in the figure, all three of these variables tended to decrease over the course of the study, though we note there was a noticeable uptick in both RTs and MTs from the Baseline to Early learning phase, once subjects started receiving trial-by-trial reward feedback based on their movements. With respect to path variability, it is not obvious that there was a significant refinement of the paths created during late learning (panel D below), though there was certainly a general trend for path variability to decrease over learning.

      Author response image 3.

      Behavioral measures of learning across the task. (A-D) shows average participant reward scores (A), reaction times (B), movement times (C) and path variability (D) over the course of the task. In each plot, the black line denotes the mean across participants and the gray banding denotes +/- 1 SEM. The three equal-length task epochs for subsequent neural analyses are indicated by the gray shaded boxes.

      In addition to these above results, we have also created a new Figure 6 in the main manuscript, which now solely focuses on individual differences in subject learning (see below). Hopefully, this figure clarifies key features of the task and its reward structure, and also depicts (in movement trajectory space) what fast versus slow learning looks like in the task. Specifically, we believe that this figure now clearly delineates for the reader the mapping between movement trajectory and the reward score feedback presented to participants, which appeared to be a source of confusion based on the reviewers’ comments below. As can be clearly observed in this figure, trajectories that approximated the ‘visible path’ (black line) resulted in fairly mediocre scores (see score color legend at right), whereas trajectories that approximated the ‘reward path’ (dashed black line, see trials 191-200 of the fast learner) resulted in fairly high scores. This figure also more clearly delineates how fPCA loadings derived from our functional data analysis were used to derive subject-level learning scores (panel C).

      Author response image 4.

      Individual differences in subject learning performance. (A) Examples of a good learner (bordered in green) and poor learner (bordered in red). (B) Individual subject learning curves for the task. Solid black line denotes the mean across all subjects whereas light gray lines denote individual participants. The green and red traces denote the learning curves for the example good and poor learners denoted in A. (C) Derivation of subject learning scores. We performed functional principal component analysis (fPCA) on subjects’ learning curves in order to identify the dominant patterns of variability during learning. The top component, which encodes overall learning, explained the majority of the observed variance (~75%). The green and red bands denote the effect of positive and negative component scores, respectively, relative to mean performance. Thus, subjects who learned more quickly than average have a higher loading (in green) on this ‘Learning score’ component than subjects who learned more slowly (in red) than average. The plot at right denotes the loading for each participant (open circles) onto this Learning score component.

      The reviewers note that there are large individual differences in learning performance across the task. This was clearly our hope when designing the reward structure of this task, as it would allow us to further investigate the neural correlates of these individual differences (indeed, during pilot testing, we sought out a reward structure to the task that would allow for these intersubject differences). The subjects who learn early during the task end up having higher fPCA scores than the subjects who learn more gradually (or learn the task late). From our perspective, these differences are a feature, and not a bug, and they do not negate any of our original interpretations. That is, subjects who learn earlier on average tend to contract their DAN-A network during the early learning phase whereas subjects who learn more slowly on average (or learn late) instead tend to contract their DAN-A network during late learning (Fig. 7).

      (2b) In the methods, the authors stated that they scaled the score such that even a perfectly traced visible path would always result in an imperfect score of 40 patients. What happens if a subject scores perfectly on the first try (which seemed to have happened for the green highlighted subject in Fig 6A), but is then permanently confronted with a score of 40 or below? Wouldn't this result in an error-clamp-like (error-based motor adaptation) design for this subject and all other high performers, which would vastly differ from the task demands for the other subjects? How did the authors factor in the wide between-subject variability?

      We think the reviewers may have misinterpreted the reward structure of the task, and we apologize for not being clearer in our descriptions. The reward score that subjects received after each trial was based on how well they traced the mirror-image of the visible path. However, all the participant can see on the screen is the visible path. We hope that our inclusion of the new Figure 6 (shown above) makes the reward structure of the task, and its relationship to movement trajectories, much clearer. We should also note that, even for the highest performing subject (denoted in Fig. 6), it still required approximately 20 trials for them to reach asymptote performance.

      (2c) The study would benefit from a more detailed description of participants' behavioral performance during the task. Specifically, it is crucial to understand how participants' motor skills evolve over time. Information on changes in movement speed, accuracy, and other relevant behavioral metrics would enhance the understanding of the relationship between behavior and brain activity during the learning process. Additionally, please clarify whether the display on the screen was presented continuously throughout the entire trial or only during active movement periods. Differences in display duration could potentially impact the observed differences in brain activity during learning.

      We hope that with our inclusion of the new Supplementary Figure 1 (shown above) this addresses the reviewers’ recommendation. Generally, we find that RTs, MTs and path variability all decrease over the course of the task. We think this relates to the early learning phase being more attentionally demanding and requiring more conscious effort, than the later learning phases.

      Also, yes, the visible path was displayed on the screen continuously throughout the trial, and only disappeared at the 4.5 second mark of each trial (when the screen was blanked and the data was saved off for 1.5 seconds prior to commencement of the next trial; 6 seconds total per trial). Thus, there were no differences in display duration across trials and phases of the task. We have now clarified this in the Methods section, where we now write the following:

      “When the cursor reached the target distance, the target changed color from red to green to indicate that the trial was completed. Importantly, other than this color change in the distance marker, the visible curved path remained constant and participants never received any feedback about the position of their cursor.”

      (2d) It is unclear from plots 6A, 6B, and 1D how the scale of the behavioral data matches with the scaling of the scores. Are these the 'real' scores, meaning 100 on the y-axis would be equivalent to 40 in the task? Why then do all subjects reach an asymptote at 75? Or is 75 equivalent to 40 and the axis labels are wrong?

      As indicated above, we clearly did a poor job of describing the reward structure of our task in our original paper, and we now hope that our inclusion of Figure 6 makes things clear. A ‘40’ score on the y-axis would indicate that a subject has perfectly traced the visible path whereas a perfect ‘100’ score would indicate that a subject has perfectly traced the (hidden) mirror image path.

      The fact that several of the subjects reach asymptote around 75 is likely a byproduct of two factors. Firstly, the subjects performed their movements in the absence of any visual error feedback (they could not see the position of a cursor that represented their hand position), which had the effect of increasing motor variability in their actions from trial to trial. Secondly, there appears to be an underestimation among subjects regarding the curvature of the concealed, mirror-image path (i.e., that the rewarded path actually had an equal but opposite curvature to that of the visible path). This is particularly evident in the case of the top-performing subject (illustrated in Figure 6A) who, even during late learning, failed to produce a completely arched movement.

      (2e) Labeling of Contrasts: There is a consistent issue with the labeling of contrasts in the presented figures, causing confusion. While the text refers to the difference as "baseline to early learning," the label used in figures, such as Figure 4, reads "baseline > early." It is essential to clarify whether the presented contrast is indeed "baseline > early" or "early > baseline" to avoid any misinterpretation.

      We thank the reviewers for catching this error. Indeed, the intended label was Early > Baseline, and this has now been corrected throughout.

      Recommendation #3. Clarify which motor learning mechanism(s) are at play.

      (3a) Participants were performing at a relatively low level, achieving around 50-60 points by the end of learning. This outcome may not be that surprising, given that reward-based learning might have a substantial explicit component and may also heavily depend on reasoning processes, beyond reinforcement learning or contextual recall (Holland et al., 2018; Tsay et al., 2023). Even within our own data, where explicit processes are isolated, average performance is low and many individuals fail to learn (Brudner et al., 2016; Tsay et al., 2022). Given this, many participants in the current study may have simply given up. A potential indicator of giving up could be a subset of participants moving straight ahead in a rote manner (a heuristic to gain moderate points). Consequently, alterations in brain networks may not reflect exploration and exploitation strategies but instead indicate levels of engagement and disengagement. Could the authors plot the average trajectory and the average curvature changes throughout learning? Are individuals indeed defaulting to moving straight ahead in learning, corresponding to an average of 50-60 points? If so, the interpretation of brain activity may need to be tempered.

      We can do one better, and actually give you a sense of the learning trajectories for every subject over time. In the figure below, which we now include as Supplementary Figure 2 in our revision, we have plotted, for each subject, a subset of their movement trajectories across learning trials (every 10 trials). As can be seen in the diversity of these trajectories, the average trajectory and average curvature would do a fairly poor job of describing the pattern of learning-related changes across subjects. Moreover, it is not obvious from looking at these plots the extent to which poor learning subjects (i.e., subjects who never converge on the reward path) actually ‘give up’ in the task — rather, many of these subjects still show some modulation (albeit minor) of their movement trajectories in the later trials (see the purple and pink traces). As an aside, we are also not entirely convinced that straight ahead movements, which we don’t find many of in our dataset, can be taken as direct evidence that the subject has given up.

      Author response image 5

      Variability in learning across subjects. Plots show representative trajectory data from each subject (n=36) over the course of the 200 learning trials. Coloured traces show individual trials over time (each trace is separated by ten trials, e.g., trial 1, 10, 20, 30, etc.) to give a sense of the trajectory changes throughout the task (20 trials in total are shown for each subject).

      We should also note that we are not entirely opposed to the idea of describing aspects of our findings in terms of subject engagement versus disengagement over time, as such processes are related at some level to exploration (i.e., cognitive engagement in finding the best solution) and exploitation (i.e., cognitively disengaging and automating one’s behavior). As noted in our reply to Recommendation #1 above, we now give some consideration of these explanations in our Discussion section, where we now write:

      “It is also possible that these task-related shifts in connectivity relates to shifts in task-general processes, such as changes in the allocation of attentional resources (Bédard and Song, 2013; Rosenberg et al., 2016) or overall cognitive engagement (Aben et al., 2020), which themselves play critical roles in shaping learning (Codol et al., 2018; Holland et al., 2018; Song, 2019; Taylor and Thoroughman, 2008, 2007; for a review of these topics, see Tsay et al., 2023). Such processes are particularly important during the earlier phases of learning when sensorimotor contingencies need to be established. While these remain questions for future work, our data nevertheless suggest that this shift in connectivity may be enabled through the PMC.”

      (3b) The authors are mixing two commonly used paradigms, reward-based learning, and motor adaptation, but provide no discussion of the different learning processes at play here. Which processes were they attempting to probe? Making this explicit would help the reader understand which brain regions should be implicated based on previous literature. As it stands, the task is hard to interpret. Relatedly, there is a wealth of literature on explicit vs implicit learning mechanisms in adaptation tasks now. Given that the authors are specifically looking at brain structures in the cerebral cortex that are commonly associated with explicit and strategic learning rather than implicit adaptation, how do the authors relate their findings to this literature? Are the learning processes probed in the task more explicit, more implicit, or is there a change in strategy usage over time? Did the authors acquire data on strategies used by the participants to solve the task? How does the baseline variability come into play here?

      As noted in our paper, our task was directly inspired by the reward-based motor learning tasks developed by Dam et al., 2013 (Plos One) and Wu et al., 2014 (Nature Neuroscience). What drew us to these tasks is that they allowed us to study the neural bases of reward-based learning mechanisms in the absence of subjects also being able to exploit error-based mechanisms to achieve learning. Indeed, when first describing the task in the Results section of our paper we wrote the following:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014).”

      If the reviewers are referring to ‘motor adaptation’ in the context in which that terminology is commonly used — i.e., the use of sensory prediction errors to support error-based learning — then we would argue that motor adaptation is not a feature of the current study. It is true that in our study subjects learn to ‘adapt’ their movements across trials, but this shaping of the movement trajectories must be supported through reinforcement learning mechanisms (and, of course, supplemented by the use of cognitive strategies as discussed in the nice review by Tsay et al., 2023). We apologize for not being clearer in our paper about this key distinction and we have now included new text in the introduction to our Results to directly address this:

      “Importantly, because subjects received no visual feedback about their actual finger trajectory and could not see their own hand, they could only use the score feedback — and thus only reward-based learning mechanisms — to modify their movements from one trial to the next (Dam et al., 2013; Wu et al., 2014). That is, subjects could not use error-based learning mechanisms to achieve learning in our study, as this form of learning requires sensory errors that convey both the change in direction and magnitude needed to correct the movement.”

      With this issue aside, we are well aware of the established framework for thinking about sensorimotor adaptation as being composed of a combination of explicit and implicit components (indeed, this has been a central feature of several of our other recent neuroimaging studies that have explored visuomotor rotation learning, e.g., Gale et al., 2022 PNAS, Areshenkoff et al., 2022 elife, Standage et al., 2023 Cerebral Cortex). However, there has been comparably little work done on these parallel components within the domain of reinforcement learning tasks (though see Codol et al., 2018; Holland et al., 2018, van Mastrigt et al., 2023; see also the Tsay et al., 2023 review), and as far as we can tell, nothing has been done to date in the reward-based motor learning area using fMRI. By design, we avoided using descriptors of ‘explicit’ or ‘implicit’ in our study because our experimental paradigm did not allow a separate measurement of those two components to learning during the task. Nevertheless, it seems clear to us from examining the subjects’ learning curves (see supplementary figure 2 above), that individuals who learn very quickly are using strategic processes (such as action exploration to identify the best path) to enhance their learning. As we noted in an above response, we did not query subjects after the fact about their strategy use, which admittedly was a missed opportunity on our part.

      Author response image 6.

      With respect to the comment on baseline variability and its relationship to performance, this is an interesting idea and one that was explored in the Wu et al., 2014 Nature Neuroscience paper. Prompted by the reviewers, we have now explored this idea in the current data set by testing for a relationship between movement path variability during baseline trials (all 70 baseline trials, see Supplementary Figure 1D above for reference) and subjects’ fPCA score on our learning task. However, when we performed this analysis, we did not observe a significant positive relationship between baseline variability and subject performance. Rather, we actually found a trend towards a negative relationship (though this was non-significant; r=-0.2916, p=0.0844). Admittedly, we are not sure what conclusions can be drawn from this analysis, and in any case, we believe it to be tangential to our main results. We provide the results (at right) for the reviewers if they are interested. This may be an interesting avenue for exploration in future work.

      Recommendation #4: Provide stronger justification for brain imaging methods.

      (4a) Observing how brain activity varies across these different networks is remarkable, especially how sensorimotor regions separate and then contract with other, more cognitive areas. However, does the signal-to-noise ratio in each area/network influence manifold eccentricity and limit the possible changes in eccentricity during learning? Specifically, if a region has a low signal-to-noise ratio, it might exhibit minimal changes during learning (a phenomenon perhaps relevant to null manifold changes in the striatum due to low signal-to-noise); conversely, regions with higher signal-to-noise (e.g., motor cortex in this sensorimotor task) might exhibit changes more easily detected. As such, it is unclear how to interpret manifold changes without considering an area/network's signal-to-noise ratio.

      We appreciate where these concerns are coming from. First, we should note that the timeseries data used in our analysis were z-transformed (mean zero, 1 std) to allow normalization of the signal both over time and across regions (and thus mitigate the possibility that the changes observed could simply reflect mean overall signal changes across different regions). Nevertheless, differences in signal intensity across brain regions — particularly between cortex and striatum — are well-known, though it is not obvious how these differences may manifest in terms of a task-based modulation of MR signals.

      To examine this issue in the current data set, we extracted, for each subject and time epoch (Baseline, Early and Late learning) the raw scanner data (in MR arbitrary units, a.u.) for the cortical and striatal regions and computed the (1) mean signal intensity, (2) standard deviation of the signal (Std) and (3) temporal signal to noise ratio (tSNR; calculated by mean/Std). Note that in the fMRI connectivity literature tSNR is often the preferred SNR measure as it normalizes the mean signal based on the signal’s variability over time, thus providing a general measure of overall ‘signal quality’. The results of this analysis, averaged across subjects and regions, is shown below.

      Author response image 7.

      Note that, as expected, the overall signal intensity (left plot) of cortex is higher than in the striatum, reflecting the closer proximity of cortex to the receiver coils in the MR head coil. In fact, the signal intensity in cortex is approximately 38% higher than that in the striatum (~625 - 450)/450). However, the signal variation in cortex is also greater than striatum (middle plot), but in this case approximately 100% greater (i.e., (~5 - 2.5)/2.5)). The result of this is that the tSNR (mean/std) for our data set and the ROI parcellations we used is actually greater in the striatum than in cortex (right plot). Thus, all else being equal, there seems to have been sufficient tSNR in the striatum for us to have detected motor-learning related effects. As such, we suspect the null effects for the striatum in our study actually stem from two sources.

      The first likely source is the relatively lower number of striatal regions (12) as compared to cortical regions (998) used in our analysis, coupled with our use of PCA on these data (which, by design, identifies the largest sources of variation in connectivity). In future studies, this unbalance could be rectified by using finer parcellations of the striatum (even down to the voxel level) while keeping the same parcellation of cortex (i.e., equate the number of ‘regions’ in each of striatum and cortex). The second likely source is our use of a striatal atlas (the Harvard-Oxford atlas) that divides brain regions based on their neuroanatomy rather than their function. In future work, we plan on addressing this latter concern by using finer, more functionally relevant parcellations of striatum (such as in Tian et al., 2020, Nature Neuroscience). Note that we sought to capture these interrelated possible explanations in our Discussion section, where we wrote the following:

      “While we identified several changes in the cortical manifold that are associated with reward-based motor learning, it is noteworthy that we did not observe any significant changes in manifold eccentricity within the striatum. While clearly the evidence indicates that this region plays a key role in reward-guided behavior (Averbeck and O’Doherty, 2022; O’Doherty et al., 2017), there are several possible reasons why our manifold approach did not identify this collection of brain areas. First, the relatively small size of the striatum may mean that our analysis approach was too coarse to identify changes in the connectivity of this region. Though we used a 3T scanner and employed a widely-used parcellation scheme that divided the striatum into its constituent anatomical regions (e.g., hippocampus, caudate, etc.), both of these approaches may have obscured important differences in connectivity that exist within each of these regions. For example, areas such the hippocampus and caudate are not homogenous areas but themselves exhibit gradients of connectivity (e.g., head versus tail) that can only be revealed at the voxel level (Tian et al., 2020; Vos de Wael et al., 2021). Second, while our dimension reduction approach, by design, aims to identify gradients of functional connectivity that account for the largest amounts of variance, the limited number of striatal regions (as compared to cortex) necessitates that their contribution to the total whole-brain variance is relatively small. Consistent with this perspective, we found that the low-dimensional manifold architecture in cortex did not strongly depend on whether or not striatal regions were included in the analysis (see Supplementary Fig. 6). As such, selective changes in the patterns of functional connectivity at the level of the striatum may be obscured using our cortex x striatum dimension reduction approach. Future work can help address some of these limitations by using both finer parcellations of striatal cortex (perhaps even down to the voxel level)(Tian et al., 2020) and by focusing specifically on changes in the interactions between the striatum and cortex during learning. The latter can be accomplished by selectively performing dimension reduction on the slice of the functional connectivity matrix that corresponds to functional coupling between striatum and cortex.”

      (4b) Could the authors clarify how activity in the dorsal attention network (DAN) changes throughout learning, and how these changes also relate to individual differences in learning performance? Specifically, on average, the DAN seems to expand early and contract late, relative to the baseline. This is interpreted to signify that the DAN exhibits lesser connectivity followed by greater connectivity with other brain regions. However, in terms of how these changes relate to behavior, participants who go against the average trend (DAN exhibits more contraction early in learning, and expansion from early to late) seem to exhibit better learning performance. This finding is quite puzzling. Does this mean that the average trend of expansion and contraction is not facilitative, but rather detrimental, to learning? [Another reviewer added: The authors do not state any explicit hypotheses, but only establish that DMN coordinates activity among several regions. What predictions can we derive from this? What are the authors looking for in the data? The work seems more descriptive than hypothesis-driven. This is fine but should be clarified in the introduction.]

      These are good questions, and we are glad the reviewers appreciated the subtlety here. The reviewers are indeed correct that the relationship of the DAN-A network to behavioral performance appears to go against the grain of the group-level results that we found for the entire DAN network (which we note is composed of both the DAN-A and DAN-B networks). That is, subjects who exhibited greater contraction from Baseline to Early learning and likewise, greater expansion from Early to Late learning, tended to perform better in the task (according to our fPCA scores). However, on this point it is worth noting that it was mainly the DAN-B network which exhibited group-level expansion from Baseline to Early Learning whereas the DAN-A network exhibited negligible expansion. This can be seen in Author response image 8 below, which shows the pattern of expansion and contraction (as in Fig. 4), but instead broken down into the 17-network parcellation. The red asterisk denotes the expansion from Baseline to Early learning for the DAN-B network, which is much greater than that observed for the DAN-A network (which is basically around the zero difference line).

      Author response image 8.

      Thus, it appears that the DAN-A and DAN-B networks are modulated to a different extent during the task, which likely contributes to the perceived discrepancy between the group-level effects (reported using the 7-network parcellation) and the individual differences effects (reported using the finer 17-network parcellation). Based on the reviewers’ comments, this seems like an important distinction to clarify in the manuscript, and we have now described this nuance in our Results section where we now write:

      “...Using this permutation testing approach, we found that it was only the change in eccentricity of the DAN-A network that correlated with Learning score (see Fig. 7C), such that the more the DAN-A network decreased in eccentricity from Baseline to Early learning (i.e., contracted along the manifold), the better subjects performed at the task (see Fig. 7C, scatterplot at right). Consistent with the notion that changes in the eccentricity of the DAN-A network are linked to learning performance, we also found the inverse pattern of effects during Late learning, whereby the more that this same network increased in eccentricity from Early to Late learning (i.e., expanded along the manifold), the better subjects performed at the task (Fig. 7D). We should note that this pattern of performance effects for the DAN-A — i.e., greater contraction during Early learning and greater expansion during Late learning being associated with better learning — appears at odds with the group-level effects described in Fig. 4A and B, where we generally find the opposite pattern for the entire DAN network (composed of the DAN-A and DAN-B subnetworks). However, this potential discrepancy can be explained when examining the changes in eccentricity using the 17-network parcellation (see Supplementary Figure 8). At this higher resolution level we find that these group-level effects for the entire DAN network are being largely driven by eccentricity changes in the DAN-B network (areas in anterior superior parietal cortex and premotor cortex), and not by mean changes in the DAN-A network. By contrast, our present results suggest that it is the contraction and expansion of areas of the DAN-A network (and not DAN-B network) that are selectively associated with differences in subject learning performance.”

      Finally, re: the reviewers’ comments that we do not state any explicit hypotheses etc., we acknowledge that, beyond our general hypothesis stated at the outset about the DMN being involved in reward-based motor learning, our study is quite descriptive and exploratory in nature. Such little work has been done in this research area (i.e., using manifold learning approaches to study motor learning with fMRI) that it would be disingenuous to have any stronger hypotheses than those stated in our Introduction. Thus, to make the exploratory nature of our study clear to the reader, we have added the following text (in red) to our Introduction:

      “Here we applied this manifold approach to explore how brain activity across widely distributed cortical and striatal systems is coordinated during reward-based motor learning. We were particularly interested in characterizing how connectivity between regions within the DMN and the rest of the brain changes as participants shift from learning the relationship between motor commands and reward feedback, during early learning, to subsequently using this information, during late learning. We were also interested in exploring whether learning-dependent changes in manifold structure relate to variation in subject motor performance.”

      We hope these changes now make it obvious the intention of our study.

      (4c) The paper examines a type of motor adaptation task with a reward-based learning component. This, to me, strongly implicates the cerebellum, given that it has a long-established crucial role in adaptation and has recently been implicated in reward-based learning (see work by Wagner & Galea). Why is there no mention of the cerebellum and why it was left out of this study? Especially given that the authors state in the abstract they examine cortical and subcortical structures. It's evident from the methods that the authors did not acquire data from the cerebellum or had too small a FOV to fully cover it (34 slices at 4 mm thickness 136 mm which is likely a bit short to fully cover the cerebellum in many participants). What was the rationale behind this methodological choice? It would be good to clarify this for the reader. Related to this, the authors need to rephrase their statements on 'whole-brain' connectivity matrices or analyses - it is not whole-brain when it excludes the cerebellum.

      As we noted above, we do not believe this task to be a motor adaptation task, in the sense that subjects are not able to use sensory prediction errors (and thus error-based learning mechanisms) to improve their performance. Rather, by denying subjects this sensory error feedback they are only able to use reinforcement learning processes, along with cognitive strategies (nicely covered in Tsay et al., 2023), to improve performance. Nevertheless, we recognize that the cerebellum has been increasingly implicated in facets of reward-based learning, particularly within the rodent domain (e.g., Wagner et al., 2017; Heffley et al., 2018; Kostadinov et al., 2019, etc.). In our study, we did indeed collect data from the cerebellum but did not include it in our original analyses, as we wanted (1) the current paper to build on prior work in the human and macaque reward-learning domain (which focuses solely on striatum and cortex, and which rarely discusses cerebellum, see Averbeck & O’Doherty, 2022 & Klein-Flugge et al., 2022 for recent reviews), and, (2) allow this to be a more targeted focus of future work (specifically we plan on focusing on striatal-cerebellar interactions during learning, which are hypothesized based on the neuroanatomical tract tracing work of Bostan and Strick, etc.). We hope the reviewers respect our decisions in this regard.

      Nevertheless, we acknowledge that based on our statements about ‘whole-brain’ connectivity and vagueness about what we mean by ‘subcortex,’ that this may be confusing for the reader. We have now removed and/or corrected such references throughout the paper (however, note that in some cases it is difficult to avoid reference to “whole-brain” — e.g., “whole-brain correlation map” or “whole-brain false discovery rate correction”, which is standard terminology in the field).

      In addition, we are now explicit in our Methods section that the cerebellum was not included in our analyses.

      “Each volume comprised 34 contiguous (no gap) oblique slices acquired at a ~30° caudal tilt with respect to the plane of the anterior and posterior commissure (AC-PC), providing whole-brain coverage of the cerebrum and cerebellum. Note that for the current study, we did not examine changes in cerebellar activity during learning.”

      (4d) The authors centered the matrices before further analyses to remove variance associated with the subject. Why not run a PCA on the connectivity matrices and remove the PC that is associated with subject variance? What is the advantage of first centering the connectivity matrices? Is this standard practice in the field?

      Centering in some form has become reasonably common in the functional connectivity literature, as there is considerable evidence that task-related (or cognitive) changes in whole-brain connectivity are dwarfed by static, subject-level differences (e.g., Gratton, et al, 2018, Neuron). If covariance matrices were ordinary scalar values, then isolating task-related changes could be accomplished simply by subtracting a baseline scan or mean score; but because the space of covariance matrices is non-Euclidean, the actual computations involved in this subtraction are more complex (see our Methods). However, fundamentally (and conceptually) our procedure is simply ordinary mean-centering, but adapted to this non-Euclidean space. Despite the added complexity, there is considerable evidence that such computations — adapted directly to the geometry of the space of covariance matrices — outperform simpler methods, which treat covariance matrices as arrays of real numbers (e.g. naive substraction, see Dodero et al. & Ng et al., references below). Moreover, our previous work has found that this procedure works quite well to isolate changes associated with different task conditions (Areshenkoff et al., 2021, Neuroimage; Areshenkoff et al., 2022, elife).

      Although PCA can be adapted to work well with covariance matrix valued data, it would at best be a less direct solution than simply subtracting subjects' mean connectivity. This is because the top components from applying PCA would be dominated by both subject-specific effects (not of interest here), and by the large-scale connectivity structure typically observed in component based analyses of whole-brain connectivity (i.e. the principal gradient), whereas changes associated with task-condition (the thing of interest here) would be buried among the less reliable components. By contrast, our procedure directly isolates these task changes.

      References cited above:

      Dodero, L., Minh, H. Q., San Biagio, M., Murino, V., & Sona, D. (2015, April). Kernel-based classification for brain connectivity graphs on the Riemannian manifold of positive definite matrices. In 2015 IEEE 12th international symposium on biomedical imaging (ISBI) (pp. 42-45). IEEE.

      Ng, B., Dressler, M., Varoquaux, G., Poline, J. B., Greicius, M., & Thirion, B. (2014). Transport on Riemannian manifold for functional connectivity-based classification. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part II 17 (pp. 405-412). Springer International Publishing.

      (4e) Seems like a missed opportunity that the authors just use a single, PCA-derived measure to quantify learning, where multiple measures could have been of interest, especially given that the introduction established some interesting learning-related concepts related to exploration and exploitation, which could be conceptualized as movement variability and movement accuracy. It is unclear why the authors designed a task that was this novel and interesting, drawing on several psychological concepts, but then chose to ignore these concepts in the analysis.

      We were disappointed to hear that the reviewers did not appreciate our functional PCA-derived measure to quantify subject learning. This is a novel data-driven analysis approach that we have previously used with success in recent work (e.g., Areshenkoff et al., 2022, elife) and, from our perspective, we thought it was quite elegant that we were able to describe the entire trajectory of learning across all participants along a single axis that explained the majority (~75%) of the variance in the patterns of behavioral learning data. Moreover, the creation of a single behavioral measure per participant (what we call a ‘Learning score’, see Fig. 6C) helped simplify our brain-behavior correlation analyses considerably, as it provided a single measure that accounts for the natural auto-correlation in subjects’ learning curves (i.e., that subjects who learn quickly also tend to be better overall learners by the end of the learning phase). It also avoids the difficulty (and sometimes arbitrariness) of having to select specific trial bins for behavioral analysis (e.g., choosing the first 5, 10, 20 or 25 trials as a measure of ‘early learning’, and so on). Of course, one of the major alternatives to our approach would have involved fitting an exponential to each subject’s learning curves and taking measures like learning rate etc., but in our experience we have found that these types of models don’t always fit well, or derive robust/reliable parameters at the individual subject level. To strengthen the motivation for our approach, we have now included the following text in our Results:

      “To quantify this variation in subject performance in a manner that accounted the auto-correlation in learning performance over time (i.e., subjects who learned more quickly tend to exhibit better performance by the end of learning), we opted for a pure data-driven approach and performed functional principal component analysis (fPCA; (Shang, 2014)) on subjects’ learning curves. This approach allowed us to isolate the dominant patterns of variability in subject’s learning curves over time (see Methods for further details; see also Areshenkoff et al., 2022).”

      In any case, the reviewers may be pleased to hear that in current work in the lab we are using more model-based approaches to attempt to derive sets of parameters (per participant) that relate to some of the variables of interest described by the reviewers, but that we relate to much more dynamical (shorter-term) changes in brain activity.

      (4f) Overall Changes in Activity: The manuscript should delve into the potential influence of overall changes in brain activity on the results. The choice of using Euclidean distance as a metric for quantifying changes in connectivity is sensitive to scaling in overall activity. Therefore, it is crucial to discuss whether activity in task-relevant areas increases from baseline to early learning and decreases from early to late learning, or if other patterns emerge. A comprehensive analysis of overall activity changes will provide a more complete understanding of the findings.

      These are good questions and we are happy to explore this in the data. However, as mentioned in our response to query 4a above, it is important to note that the timeseries data for each brain region was z-scored prior to analysis, with the aim of removing any mean changes in activity levels (note that this is a standard preprocessing step when performing functional connectivity analysis, given that mean signal changes are not the focus of interest in functional connectivity analyses).

      To further emphasize these points, we have taken our z-scored timeseries data and calculated the mean signal for each region within each task epoch (Baseline, Early and Late learning, see panel A in figure below). The point of showing this data (where each z-score map looks near identical across the top, middle and bottom plots) is to demonstrate just how miniscule the mean signal changes are in the z-scored timeseries data. This point can also be observed when plotting the mean z-score signal across regions for each epoch (see panel B in figure below). Here we find that Baseline and Early learning have a near identical mean activation level across regions (albeit with slightly different variability across subjects), whereas there is a slight increase during late learning — though it should be noted that our y-axis, which measures in the thousandths, really magnifies this effect.

      To more directly address the reviewers’ comments, using the z-score signal per region we have also performed the same statistical pairwise comparisons (Early > Baseline and Late>Early) as we performed in the main manuscript Fig. 4 (see panel C in Author response image 9 below). In this plot, areas in red denote an increase in activity from Baseline to Early learning (top plot) and from Early to Late learning (bottom plot), whereas areas in blue denote a decrease for those same comparisons. The important thing to emphasize here is that the spatial maps resulting from this analysis are generally quite different from the maps of eccentricity that we report in Fig. 4 in our paper. For instance, in the figure below, we see significant changes in the activity of visual cortex between epochs but this is not found in our eccentricity results (compare with Fig. 4). Likewise, in our eccentricity results (Fig. 4), we find significant changes in the manifold positioning of areas in medial prefrontal cortex (MPFC), but this is not observed in the activation levels of these regions (panel C below). Again, we are hesitant to make too much of these results, as the activation differences denoted as significant in the figure below are likely to be an effect on the order of thousandths of a z-score (e.g., 0.002 > 0.001), but this hopefully assuages reviewers’ concerns that our manifold results are solely attributable to changes in overall activity levels.

      We are hesitant to include the results below in our paper as we feel that they don’t add much to the interpretation (as the purpose of z-scoring was to remove large activation differences). However, if the reviewers strongly believe otherwise, we would consider including them in the supplement.

      Author response image 9.

      Examination of overall changes in activity across regions. (A) Mean z-score maps across subjects for the Baseline (top), Early Learning (middle) and Late learning (bottom) epochs. (B) Mean z-score across brain regions for each epoch. Error bars represent +/- 1 SEM. (C) Pairwise contrasts of the z-score signal between task epochs. Positive (red) and negative (blue) values show significant increases and decreases in z-score signal, respectively, following FDR correction for region-wise paired t-tests (at q<0.05).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odorevoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, directly impacts PN excitability, and uniformly enhances PN responses to odors.

      Weaknesses:

      The one remaining issue to be resolved is the theoretical discrepancy between the physiology and the behavior. The authors provide a computational model that could explain this discrepancy and provide the caveat that while the physiological data was collected from the antennal lobe, but there could be other olfactory processing stages involved. Indeed other processing stages could be the sites for the computational functions proposed by the model. There is an additional caveat which is that the physiological data were collected 5-10 minutes after serotonin application whereas the behavioral data were collected 3 hours after serotonin application. It is difficult to link physiological processes induced 5 minutes into serotonin application to behavioral consequences 3 hours subsequent to serotonin application. The discrepancy between physiology and behavior could easily reflect the timing of action of serotonin (i.e. differences between immediate and longer-term impact).

      For our behavioral experiments, we waited 3 hours after serotonin injection to allow serotonin to penetrate through the layers of air sacks and the sheath, and for the locusts to calm down and recover their baseline POR activity levels. For the physiology experiments, we noticed that the quality of the patch decreased over time after serotonin introduction. Hence, it was difficult to hold cells for that long. However, the point raised by the reviewer is well-taken. We have performed additional experiments to show that the changes in POR levels to different odorants are rapid and can be observed within 15 minutes of injecting serotonin (Author response image 2) and that the physiological changes in PNs (bursting spontaneous activity, maintenance of temporal firing patterns, and increase odor-evoked responses) persists when the cells are held for longer duration (i.e. 3 hours akin to our behavioral experiments). It is worth noting that 3-hour in-vivo intracellular recordings are not easily achievable and come with many experimental constraints. So far, we have managed to record from two PNs that were held for this long and add them to this rebuttal to support our conclusions. (Author response image 1).

      Author response image 1.

      Spontaneous and odor-evoked responses in individual PNs remain consistent for three hours after serotonin introduction into the recording chamber/bath. (A) Representative intracellular recording showing membrane potential fluctuations in a projection neuron (PN) in the antennal lobe. Spontaneous and odor-evoked responses to four odorants (pink color bars, 4 s duration) are shown before (control) and after serotonin application (5HT). Voltage traces 30 minutes (30min), 1 hour (1h), 2 hours (2h), and 3 hours (3h) after 5HT application are shown to illustrate the persisting effect of serotonin during spontaneous and odor-evoked activity periods. (B) Rasterized spiking activities in two recorded PNs are shown. Spontaneous and odor-evoked responses are shown in all 5 consecutive trials. Note that the odor-evoked response patterns are maintained, but the spontaneous activity patterns are altered after serotonin introduction.

      Author response image 2.

      Palp-opening response (POR) patterns to different odorants remain consistent following serotonin introduction. The probability of PORs is shown as a bar plot for four different odorants; hexanol (green), benzaldehyde (blue), linalool (red), and ammonium (purple). PORs before serotonin injection (solid bars) are compared against response levels after serotonin injection (striped bars). As can be noted, PORs to the four odorants remain consistent when tested 15 minutes and 3 hours after (5HT) serotonin injection.

      Overall, the study demonstrates the impact of serotonin on odor-evoked responses of PNs and odor-guided behavior in locusts. Serotonin appears to have non-linear effects including changing the firing patterns of PNs from monotonic to bursting and altering behavioral responses in an odor-specific manner, rather than uniformly across all stimuli presented.

      We thank the reviewer for again providing very useful feedback for improving our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odor-specific way. In physiology experiments, they can show that projection neurons in the antennal lobe generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odor-specific changes in behavior.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of projection neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla.

      Weaknesses:

      I still have several concerns regarding the generalizability of the model and interpretation of results. The authors cannot provide evidence that serotonin modulation of projection neurons impacts behavior.

      This is true and likely to be true for any study linking neural responses to behavior. There are multiple circuits and pathways that would get impacted by a neuromodulator like serotonin. What we showed with our physiology is how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Given the specificity of the changes in behavioral outcomes (i.e. odor-specific increase and decrease in an appetitive behavior) and non-specificity in the changes at the level of individual PNs (general increase in odor-evoked spiking activity), we presented a relatively simple computational model to address the apparent mismatch between neural and behavioral responses. (Author response image 4).

      The authors show that odor identity is maintained after 5-HT injection, however, the authors do not show if PN responses to different odors were differently affected after serotonin exposure.

      The PN responses to different odorants changed in a qualitatively similar fashion. (Author response image 3)

      Author response image 3.

      PN activity before and after 5HT application are compared for different cellodor combinations. As can be noted, the changes are qualitatively similar in all cases. After 5HT application, the baseline activity became more bursty, but the odor-evoked response patterns were robustly maintained for all odorants.

      Regarding the model, the authors show that the model works for odors with non-overlapping PN activation. However, only one appetitive, one neutral, and one aversive odor has been tested and modeled here. Can the fixed-weight model also hold for other appetitive and aversive odors that might share more overlap between active PNs? How could the model generate BZA attraction in 5-HT exposed animals (as seen in behavior data in Figure 1) if the same PNs just get activated more?

      Author response image 4.

      Testing the generality of the proposed computational model. To test the generality of the model proposed we used a published dataset [Chandak and Raman, 2023]: Neural dataset – 89 PN responses to a panel of twenty-two odorants; Behavioral dataset – probability of POR responses to the same twenty-two odorants. We built the model using just the three odorants overlapping between the two datasets: hexanol, benzaldehyde and linalool. The true probability of POR values of the twenty odorants and the POR probability predicted by the model are shown for all twenty-two odorants as a scatter plot. As can be noted, there is a high correlation (0.79) between the true and the predicted values.

      The authors should still not exclude the possibility that serotonin injections could affect behavior via modulation of other cell types than projection neurons. This should still be discussed, serotonin might rather shut down baseline activation of local inhibitory neurons - and thus lead to the interesting bursting phenotypes, which can also be seen in the baseline response, due to local PN-to-LN feedback.

      As we agreed, there could be other cells that are impacted by serotonin release. Our goal in this study was to characterize how spontaneous and odor-evoked responses in the very first neural network that receives olfactory sensory neuron input are altered by serotonin. Within this circuit, there are local inhibitory neurons (LNs), as correctly indicated by this reviewer. Surprisingly, our preliminary data indicates that LNs are not shut down but also have an enhanced odor-evoked neural response. (Author response image 5.) Further data would be needed to verify this observation and determine the mechanism that mediate the changes in PN excitability. Irrespective, since PN activity should incorporate the effects of changes in the local neuron responses and is the sole output from the antennal lobe that drives all downstream odor-evoked activity, we focused on them in this study.

      Author response image 5.

      Representative traces showing intracellular recording from a local neuron in the antennal lobe. Five consecutive trials are shown. Note that LNs in the locust antennal lobe are non-spiking. The LN activity before, during, and after the presentation of benzaldehyde and hexanol (colored bar; 4s) are shown. The Left and Right panels show LN activity before and after the application of 5HT. As can be noted, 5HT did not shut down odor-evoked activity in this local neuron.

      The authors did not fully tone down their claims regarding causality between serotonin and starved state behavioral responses. There is no proof that serotonin injection mimics starved behavioral responses.

      Specific minor issues:<br /> It is still unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium). The new method part does not indicate the concentrations of odors used for electrophysiology.

      All odorants were diluted to 0.01-10% concentration by volume in either mineral oil or distilled water. This information is included in the Methods section. For most odorants used in the study, the lower concentrations only evoked a very weak neural response, and the higher concentrations evoked more robust responses. The POR responses for these odorants at various concentrations chosen are included in Figure 2. Note, that the responses to linalool and ammonium remained weak throughout the concentration changes, compared to hexanol and benzaldehyde.

      Did all tested PNs respond to all odorants?

      No, only a subset of them responses to each odorant. These responses have been well characterized in earlier publications [included refs].

      The authors do not show if PN responses to different odors were differently affected after serotonin exposure. They describe that ON responses were robust, but OFF responses were less consistent after 5-HT injection. Was this true across all odors tested? Example traces are shown, but the odor is not indicated in Figure 4A. Figure 4D shows that many odor-PN combinations did not change their peak spiking activity - was this true across odorants? In Figure 5 - are PNs ordered by odor-type exposure?

      Also, Figure 6A only shows example trajectories for odorants - how does the average look? Regarding the data used for the model - can the new dataset from the 82 odor-PN pairs reproduce the activation pattern of the previously collected dataset of 89 pairs?

      What is shown in Figure 6A is the trial-averaged response trajectory combining activities of all 82 odor-PN pairs. 82 odor-PN pair was collected intracellularly examining the responses to four odorants before and after 5HT application. The second dataset involving 89 PN responses to 22 odorants was collected extracellularly. They have qualitative similarities in each odorant activate a unique subset of those neurons.

      The authors toned down their claims that serotonin injection can mimic the starved state behavioral response. However, some sentences still indicate this finding and should also be toned down:

      last sentence of introduction - "In sum, our results provide a more systems-level view of how a specific neuromodulator (serotonin) alters neural circuits to produce flexible behavioral outcomes."

      We believe we showed this with our computational model, how uniform changes in the neural responses could lead to variable and odor-specific changes in behavioral PORs.

      discussion: "Finally, fed locusts injected with serotonin generated similar appetitive responses to food-related odorants as starved locusts indicating the role of serotonin in hunger statedependent modulation of odor-evoked responses." This claim is not supported.

      Figure 7 shows that the fed locusts had lower POR to hex and bza. The POR responses significantly increased after the 5HT application. However, we have rephrased this sentence to limit our claims to this result. "Finally, fed locusts injected with serotonin generated similar appetitive palp-opening responses to food-related odorants as observed in starved locusts”

      last results: "However, consistent with results from the hungry locusts, the introduction of serotonin increased the appetitive POR responses to HEX and BZA. Intriguingly, the appetitive responses of fed locusts treated with 5HT were comparable or slightly higher than the responses of hungry locusts to the same set of odorants."

      Again this sentence simply describes the result shown in Figure 7.

      In Figure 7 - BZA response seems unchanged in hungry and fed animals and only 5-HT injection enhances the response. There is only one example where 5-HT application and starvation induce the same change in behavior - N=1 is not enough to conclude that serotonin influences food-driven behaviors.

      The reviewer is ignoring the lack of changes to PORs to linalool and ammonium. Taken together, serotonin increased PORs to only two of the four odorants in starved locusts. The responses after 5HT modulation to these four odorants were similar in fed locusts treated with 5HT and starved locusts.

      Also, this seems to be wrongly interpreted in Figure 7: "It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, remained unchanged in fed locusts treated with 5HT." The authors indicate a significant reduction in POR after 5-HT injection on LOOL response in Figure 7.

      Revised.<br /> It is worth noting that responses to LOOL and AMN, non-food related odorants with weaker PORs, and reduced in fed locusts treated with 5HT."

      Also, the newly added sentence at the end of the discussion does not make sense: "However, since 5HT increased behavioral responses in both fed and hungry locusts, the precise role of 5HT modulation and whether it underlies hunger-state dependent modulation of appetitive behavior still remains to be determined."<br /> The authors did not test 5-HT injection in starved animals

      The results shown in Figure 1 compare the POR responses of starved locusts before and after 5HT introduction.

      We again thank the reviewer for useful feedback to further improve our manuscript.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript explores the impact of serotonin on olfactory coding in the antennal lobe of locusts and odor-evoked behavior. The authors use serotonin injections paired with an odor-evoked palp-opening response assay and bath application of serotonin with intracellular recordings of odor-evoked responses from projection neurons (PNs).

      Strengths:

      The authors make several interesting observations, including that serotonin enhances behavioral responses to appetitive odors in starved and fed animals, induces spontaneous bursting in PNs, and uniformly enhances PN responses to odors. Overall, I had no technical concerns. Weaknesses:

      While there are several interesting observations, the conclusions that serotonin enhanced sensitivity specifically and that serotonin had feeding-state-specific effects, were not supported by the evidence provided. Furthermore, there were other instances in which much more clarification was needed for me to follow the assumptions being made and inadequate statistical testing was reported.

      Major concerns.

      • To enhance olfactory sensitivity, the expected results would be that serotonin causes locusts to perceive each odor as being at a relatively higher concentration. The authors recapitulate a classic olfactory behavioral phenomenon where higher odor concentrations evoke weaker responses which is indicative of the odors becoming aversive. If serotonin enhanced the sensitivity to odors, then the dose-response curve should have shifted to the left, resulting in a more pronounced aversion to high odor concentrations. However, the authors show an increase in response magnitude across all odor concentrations. I don't think the authors can claim that serotonin enhances the behavioral sensitivity to odors because the locusts no longer show concentration-dependent aversion. Instead, I think the authors can claim that serotonin induces increased olfactory arousal.

      The reviewer makes a valid point. Bath application of serotonin increased POR behavioral responses across all odor concentrations, and concentration-dependent aversion was also not observed. Furthermore, the monotonic relationship between projection neuron responses and the intensity of current injection is altered when serotonin is exogenously introduced (see Author response image 1; see below for more explanation). Hence, our data suggests that serotonin alters the dose-response relationship between neural/behavioral responses and odor intensity. As recommended, we have followed what the reviewer has suggested and revised our claim to serotonin inducing increase in olfactory arousal. The new physiology data has been added as Supplementary Figure 3 to the revised manuscript.

      • The authors report that 5-HT causes PNs to change from tonic to bursting and conclude that this stems from a change in excitability. However, excitability tests (such as I/V plots) were not included, so it's difficult to disambiguate excitability changes from changes in synaptic input from other network components.

      To confirm that the PN excitability did indeed change after serotonin application, we performed a new set of current-clamp recordings. In these experiments, we monitored the spiking activities in individual PNs as we injected different levels of current injections (200 – 1000 pico Amperes). Note that locust LNs that provide recurrent inhibition arborize and integrate inputs from a large number of sensory neurons and projection neurons. Therefore, activating a single PN should not activate the local neurons and therefore the antennal lobe network.

      We found that the total spiking activity monotonically increased with the magnitude of the current injection in all four PNs recorded (Author response image 1). However, after serotonin injection, we found that the spiking activity remained relatively stable and did not systematically vary with the magnitude of the current injection. While the changes in odor-evoked responses may incorporate both excitability changes in individual PNs and recurrent feedback inhibition through GABAergic LNs, these results from our current injection experiments unambiguously indicate that there are changes in excitability at the level of individual PNs. We have added this result to the revised manuscript.

      Author response image 1.

      Current-injection induced spiking activity in individual PNs is altered after serotonin application. (A) Representative intracellular recordings showing membrane potential fluctuations as a function of time for one projection neuron (PNs) in the locust antennal lobe. A two-second window when a positive 200-1000pA current was applied is shown. Firing patterns before (left) and after (right) serotonin application are shown for comparison. Note, the spiking activity changes after the 5HT application. The black bar represents the 20mV scale. (B) Dose-response curves showing the average number of action potentials (across 5 trials) during the 2second current pulse before (green) and after (purple) serotonin for each recorded PN. Note that the current intensity was systematically increased from 200 pA to 1000 pA. The (C) The mean number of spikes across the four recorded cells during current injection is shown. The color progression represents the intensity of applied current ranging 200pA (leftmost bar) to 1000pA (rightmost bar). The dose-response trends before (green) and after (purple) 5HT application are shown for comparison. The error bars represent SEM across the four cells.

      • There is another explanation for the theoretical discrepancy between physiology and behavior, which is that odor coding is further processing in higher brain regions (ie. Other than the antennal lobe) not studied in the physiological component of this study. This should at least be discussed.

      This is a valid argument. For our model of neural mapping onto behavior to work, we only need the odorant that evokes or suppresses PORs to activate a distinct set of neurons. Having said that, our extracellular recording results (Fig. 6E) indicate that hexanol (high POR) and linalool (low POR) do activate highly non-overlapping sets of PNs in the antennal lobe. Hence, our results suggest that the segregation of neural activity based on behavioral relevance already begins in the antennal lobe. We have added this clarification to the discussion section.

      • The authors cannot claim that serotonin underlies a hunger state-dependent modulation, only that serotonin impacts responses to appetitive odors. Serotonin enhanced PORs for starved and fed locusts, so the conclusion would be that serotonin enhances responses regardless of the hunger state. If the authors had antagonized 5-HT receptors and shown that feeding no longer impacts POR, then they could make the claim that serotonin underlies this effect. As it stands, these appear to be two independent phenomena.

      This is also a valid point. We have clarified this in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigate the influence of serotonin on feeding behavior and electrophysiological responses in the antennal lobe of locusts. They find that serotonin injection changes behavior in an odorspecific way. In physiology experiments, they can show that antennal lobe neurons generally increase their baseline firing and odor responses upon serotonin injection. Using a modeling approach the authors propose a framework on how a general increase in antennal lobe output can lead to odorspecific changes in behavior. The authors finally suggest that serotonin injection can mimic a change in a hunger state.

      Strengths:

      This study shows that serotonin affects feeding behavior and odor processing in the antennal lobe of locusts, as serotonin injection increases activity levels of antennal lobe neurons. This study provides another piece of evidence that serotonin is a general neuromodulator within the early olfactory processing system across insects and even phyla. Weaknesses:

      I have several concerns regarding missing control experiments, unclear data analysis, and interpretation of results.

      A detailed description of the behavioral experiments is lacking. Did the authors also provide a mineral oil control and did they analyze the baseline POR response? Is there an increase in baseline response after serotonin exposure already at the behavioral output level? It is generally unclear how naturalistic the chosen odor concentrations are. This is especially important as behavioral responses to different concentrations of odors are differently modulated after serotonin injection (Figure 2: Linalool and Ammonium).

      POR protocol: Sixth instar locusts (Schistocera americana) of either sex were starved for 24-48 hours before the experiment or taken straight from the colony and fed blades of grass for the satiated condition. Locusts were immobilized by placing them in the plastic tube and securing their body with black electric tape (see Author response image 2). Locusts were given 20 - 30 minutes to acclimatize after placement in the immobilization tube. As can be noted, the head of the locusts along with the antenna and maxillary palps protruded out of this immobilization tube so they can be freely moved by the locusts. Note that the maxillary palps are sensory organs close to the mouth parts that are used to grab food and help with the feeding process.

      It is worth noting that our earlier studies had shown that the presentation of ‘appetitive odorants’ triggers the locust to open their maxillary palps even when no food is presented (Saha et al., 2017; Nizampatnam et al., 2018; Nizampatnam et al., 2022; Chandak and Raman, 2023.) Furthermore, our earlies results indicate that the probability of palp opening varies across different odorants (Chandak and Raman, 2023). We chose four odorants that had a diverse range of palp-opening: supra-median (hexanol), median (benzaldehyde), and sub-median (linaool). Therefore, each locust in our experiments was presented with one concentration of four odorants (hexanol, benzaldehyde, linalool, and ammonium) in a pseudorandomized order. The odorants were chosen based on our physiology results such that they evoked different levels of spiking activities.

      The odor pulse was 4 s in duration and the inter-pulse interval was set to 60 s. The experiments were recorded using a web camera (Microsoft) placed right in front of the locusts. The camera was fully automated with the custom MATLAB script to start recording 2 seconds before the odor pulse and end recording at odor termination. An LED was used to track the stimulus onset/offset. The POR responses were manually scored offline. Responses to each odorant were scored a 0 or 1 depending on if the palps remained closed or opened. A positive POR was defined as a movement of the maxillary palps during the odor presentation time window as shown on the locust schematic (Main Paper Figure 1).

      Author response image 2.

      Pictures showing the behavior experiment setup and representative palp-opening responses in a locust.

      As the reviewer inquired, we performed a new series of POR experiments, where we explored POR responses to mineral oil and hexanol, before and after serotonin injection. For this study, we used 10 locusts that were starved 24-48 hours before the experiment. Note that hexanol was diluted at 1% (v/v) concentration in mineral oil. Our results reveal that locusts PORs to hexanol (~ 50% PORs) were significantly higher than those triggered by mineral oil (~10% PORs). Injection of serotonin increased the POR response rate to hexanol but did not alter the PORs evoked by mineral oil (Author response image 3).

      Author response image 3.

      Serotonin does not alter the palp-opening responses evoked by paraffin oil. The PORs before and after (5HT) serotonin injection are summarized and shown as a bar plot for hexanol and paraffin oil. Striped bars signify the data collected after 5HT injection. Significant differences are identified in the plot (one-tailed paired-sample t-test; (*p<0.05).

      Regarding recordings of potential PNs - the authors do not provide evidence that they did record from projection neurons and not other types of antennal lobe neurons. Thus, these claims should be phrased more carefully.

      In the locust antennal lobe, only the cholinergic projection neurons fire full-blown sodium spikes. The GABAergic local neurons only fire calcium ‘spikelets’ (Laurent, TINS, 1996; Stopfer et al., 2003; see Author response image 4 for an example). Hence, we are pretty confident that we are only recording from PNs. Furthermore, due to the physiological properties of the LNs, their signals being too small, they are also not detected in the extracellular recordings from the locust antennal lobe. Hence, we are confident with our claims and conclusion.

      Author response image 4.

      PN vs LN physiological differences: Left: A representative raw voltage traces recorded from a local neuron before, during, and after a 4-second odor pulse are shown. Note that the local neurons in the locust antennal lobe do not fire full-blown sodium spikes but only fire small calcium spikelets. On the right: A representative raw voltage trace recorded from a representative projection neuron is shown for comparison. Clear sodium spikes are clearly visible during spontaneous and odor-evoked periods. The gray bar represents 4 seconds of odor pulse. The vertical black bar represents the 40mV.

      The presented model suggests labeled lines in the antennal lobe output of locusts. Could the presented model also explain a shift in behavior from aversion to attraction - such as seen in locusts when they switch from a solitarious to a gregarious state? The authors might want to discuss other possible scenarios, such as that odor evaluation and decision-making take place in higher brain regions, or that other neuromodulators might affect behavioral output. Serotonin injections could affect behavior via modulation of other cell types than antennal lobe neurons. This should also be discussed - the same is true for potential PNs - serotonin might not directly affect this cell type, but might rather shut down local inhibitory neurons.

      There are multiple questions here. First, regarding solitary vs. gregarious states, we are currently repeating these experiments on solitary locusts. Our preliminary results (not included in the manuscript) indicate that the solitary animals have increased olfactory arousal and respond with a higher POR but are less selective and respond similarly to multiple odorants. We are examining the physiology to determine whether the model for mapping neural responses onto behavior could also explain observations in solitary animals.

      Second, this reviewer makes the point raised by Reviewer 1. We agree that odor evaluation and decisionmaking might take place in higher brain regions. All we could conclude based on our data is that a segregation of neural activity based on behavioral relevance might provide the simplest approach to map non-specific increase in stimulus-evoked neural responses onto odor-specific changes in behavioral outcome. Furthermore, our results indicate that hexanol and linalool, two odorants that had an increase and decrease in PORs after serotonin injection, had only minimal neural response overlap in the antennal lobe. These results suggest that the formatting of neural activity to support varying behavioral outcomes might already begin in the antennal lobe. We have added this to our discussion.

      Third, regarding serotonin impacting PNs, we performed a new set of current-clamp experiments to examine this issue (Author response image 1). Our results clearly show that projection neuron activity in response to current injections (that should not incorporate feedback inhibition through local neurons) was altered after serotonin injection. Therefore, the observed changes in the odor-evoked neural ensemble activity should incorporate modulation at both individual PN level and at the network level. We have added this to our discussion as well.

      Finally, the authors claim that serotonin injection can mimic the starved state behavioral response. However, this is only shown for one of the four odors that are tested for behavior (HEX), thus the data does not support this claim.

      We note that Hex is the only appetitive odorant in the panel. But, as reviewer 1 has also brought up a similar point, we have toned down our claims and will investigate this carefully in a future study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • Was the POR of the locusts towards linalool and ammonium higher than towards a blank odor cartridge? I ask because the locusts appear to be less likely to respond to these odors and so I am concerned that this assay is not relevant to the ecological context of these odors. In other words, perhaps serotonin did not enhance the responses to these odors in this assay, because this is not a context in which locusts would normally respond to these odors.

      The POR response to linalool and ammonium is lower and comparable to that of paraffin oil. Serotonin does not increase POR responses to paraffin oil but does increase response to hexanol (an appetitive odorant). We have clarified this using new data (Author response image 5).

      • It seems to me that Figure 5C is the crux for understanding the potential impact of 5-HT on odor coding, but it is somewhat confusing and underutilized. Is the implication that 5-HT decorrelates spontaneous activity such that when an odor stimulus arrives, the odor-evoked activity deviates to a greater degree? The authors make claims about this figure that require the reader to guess as to the aspect of the figure to which they are referring.

      The reviewer makes an astute observation. Yes, the spontaneous activity in the antennal lobe network before serotonin introduction is not correlated with the ensemble spontaneous activity after serotonin bath application. Remarkably, the odor-evoked responses were highly similar, both in the reduced PCA space and when assayed using high-dimensional ensemble neural activity vectors. Whether the changes in network spontaneous activity have a function in odor detection and recognition is not fully understood and cannot be convincingly answered using our data. But this is something that we had pondered.

      • The modeling component summarized in Figure 6 needs clarification and more detail. Perhaps example traces associated with positive weighting within neural ensemble 1 relative to neural ensemble 2? I struggled to understand conceptually how the model resolved the theoretical discrepancy between physiology and behavior.

      As recommended, here is a plot showing the responses of four PNs that had positive weights to hexanol and linalool. As can be expected, each PN in this group had higher responses to hexanol and no response to linalool. Further, the four PNs that received negative weights had response only to linalool.

      Author response image 5.

      Odor-evoked responses of four PNs that received positive weights in the model (top panel), and four PNs that were assigned negative weights in the model (bottom).

      • Was there a significant difference between the PORs of hungry vs. fed locusts? The authors state that they differ and provide statistics for the comparisons to locusts injected with 5-HT, but then don't provide any statistical analyses of hungry vs. fed animals.

      The POR responses to HEX (an appetitive odorant) were significantly different between the hungry and starved locusts.

      Author response image 6.

      A bar plot summarizing PORs to all four odors for satiated locust (highlighted with stripes), before (dark shade), and after 5HT injection (lighter shade). To allow comparison before 5HT injection for starved locust plotted as well (without stripes). The significance was determined using a one-tailed paired-sample ttest(*p<0.05).

      • Were any of the effects of 5-HT on odor-evoked PN responses significant? No statistics are provided.

      We examined the distribution of odor-evoked responses in PNs before and after 5HT introduction. We found that the overall distribution was not significantly different between the two (one-tailed pairedsample t-test; p = 0.93).

      Author response image 7.

      Comparison of the distribution of odor-evoked PN responses before (green) and after (purple) 5HT introduction. One-tailed paired sample t-test was used to compare the two distributions.

      • The authors interchangeably use "serotonin", "5HT" and "5-HT" throughout the manuscript, but this should be consistent.

      This has been fixed in the revised manuscript.

      • On page 2 the authors provide an ecological relevance for linalool as being an additive in pesticides, however, linalool is a common floral volatile chemical. Is the implication that locusts have learned to associate linalool with pesticides?

      Linalool is a terpenoid alcohol that has a floral odor but has also been used as a pesticide and insect repellent [Beier et al., 2014]. As shown in Author response image 2, it evoked the least POR responses amongst a diverse panel of 22 odorants that were tested. We have clarified how we chose odorants based on the prior dataset in the Methods section.

      • In Figure 1, there should be a legend in the figure itself indicating that the black box indicates the absence of POR and the white box indicates presence, rather than just having it in the legend text.

      Done.

      • In Figure 2, the raw data from each animal can be moved to the supplements. The way it is presented is overwhelming and the order of comparisons is difficult to follow.

      Done.

      • For the induction of bursting in PNs by the application of 5-HT, were there any other metrics observed such as period, duration of bursts, or peak burst frequency? The authors rely on ISI, but there are other bursting metrics that could also be included to understand the nature of this observation. In particular, whether the bursts are likely due to changes in intrinsic biophysical properties of the PNs or polysynaptic effects.

      We could use other metrics as the reviewer suggests. Our main point is that the spontaneous activity of individual PNs changed. We have added a new current-injection experiments to show that the PNs output to square pulses of current becomes different after serotonin application (Author response image 1)

      • Were 4-vinyl anisole, 1-nonanol, and octanoic acid selected as additional odors because they had particular ecological relevance, or was it for the diversity of chemical structure?

      These odorants were selected based on both, chemical structure and ecological relevance. The logic behind this was to have a very diverse odor panel that consisted of food odorant – Hexanol, aggregation pheromone – 4-vinyl anisole, sex pheromone – benzaldehyde, acid – octanoic acid, base – ammonium, and alcohol – 1-nonanol. Additionally, we selected these odors based on previous neural and behavioral data on these odorants (Chandak and Raman, 2023, Traner and Raman, 2023, Nizampatnam et al, 2022 & 2018; Saha et al., 2017 & 2013).

      Reviewer #2 (Recommendations For The Authors):

      The electrophysiology dataset combines all performed experiments across all tested different PN-odor pairs. How many odors have been tested in a single PN and how many PNs have been tested for a single odor? This information is not present in the current manuscript. Can the authors exclude that there are odor-specific modulations?

      In total, our dataset includes recordings from 19 PNs. Seven PNs were tested on a panel of seven odorants (4-vinyl anisole, 1-nonanol, octanoic acid, Hex, Bza, Lool, and Amn), and the remaining twelve were tested with the four main odorants used in the study (Hex, Bza, Lool, and Amn). This information has been added to the Methods section

      How did the authors choose the concentrations of serotonin injections and bath applications - is this a naturalistic amount?

      The serotonin concentration for ephys experiments was chosen based on trial-error experiments:

      0.01mM was the highest concentration that did not cause cell death. For the behavioral experiments, we increased the concentration (0.1 M) due to the presence of anatomical structures in the locust's head such as air sacks, sheath as well as hemolymph which causes some degree of dilution that we cannot control.

      Behavior experiments were performed 3 hours after injection - ephys experiments 5-10 minutes following bath application. Can the authors exclude that serotonin affects neural processing differently on these different timescales?

      We cannot exclude this possibility. We did ePhys experiments 5-10 minutes after bath application as it would be extremely hard to hold cells for that long.

      A longer delay was required for our behavioral experiments as the locusts tended to be a bit more agitated with larger spontaneous movements of palps as well as exhibited unprompted vomiting. A 3hour period allowed the locust to regain its baseline level movements after 5HT introduction. [This information has been added to the methods section of the revised manuscript]

      Concerning the analysis of electrophysiological data. The authors should correct for changes in the baseline before performing PCA analysis. And how much of the variance is explained by PC1 and PC2?

      We did not correct for baseline changes or subtract baseline as we wanted to show that the odor-evoked neural responses still robustly encoded information about the identity of the odorant.

      The authors should perform dye injections after recordings to visualize the cell type they recorded from. Serotonin might affect also other cell types in the antennal lobe.

      As mentioned above, in the locust antennal lobe only PNs fire full-blown sodium spikes, and LNs only fire calcium spikelets (Author response image 4). Since these signals are small, they will be buried under the noise floor when using extracellular recording electrodes for monitoring responses in the AL antennal lobe.

      Hence we are pretty certain what type of cells we are recording from.

      There were several typos in the manuscript, please check again.

      We have fixed many of the grammatical errors and typos in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

      Strengths

      The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

      Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

      One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

      A classifier using the neuronal response and trained to distinguish single-speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

      The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

      We thank the Reviewer for the thorough summary of the findings and supportive comments.

      Weaknesses

      While the comparison of the overall pattern of behavioral performance between monkeys and humans is important, some of the detailed comparisons are not well supported by the data. For instance, whether the monkey used the apparent coherence simply wasn't tested and a difference between 4 human subjects and a single monkey subject cannot be tested statistically in a meaningful manner. I recommend removing these observations from the manuscript and leaving it at "The difference between the monkey and human results may be due to species differences or individual variability" (and potentially add that there are differences in the task as well; the monkey received feedback on the correctness of their choice, while the humans did not.)

      Thanks for the suggestion. We agree and have modified the text accordingly. We now state on page 8, lines 189-191, "The difference between the monkey and human results may be due to species differences or individual variability. The differences in behavioral tasks may also play a role – the monkey received feedback on the correctness of the choice, whereas human subjects did not."

      A control experiment aims to show that the "fastest speed takes all" behavior is general by presenting two stimuli that move at fast/slow speeds in orthogonal directions. The claim that these responses also show the "fastest speed takes all" is not well supported by the data. In fact, for directions in which the slow speed leads to the largest response on its own, the population response to the bi-speed stimulus is the average of the response to the components (This is fine. One model can explain all direction tuning curve, which also explain averaging at the slower speed stronger directions). Only for the directions where the fast speed stimulus is the preferred direction is there a bias towards the faster speed (Figure 7A). The quantification of this effect in Figure 7B seems to suggest otherwise, but I suspect that this is driven by the larger amplitude of Rf in Figure 8, and the constraint that ws and wf are constant across directions. The interpretation of this experiment needs to be reconsidered.

      The Reviewer raised a good question. Our model with fixed weights for faster and slower components across stimulus directions provided a parsimonious explanation for the whole tuning curve, regardless of whether the faster component elicited a stronger response than the slower component. Because the model can be well constrained by the measured direction-tuning curves, we did not restrain 𝑤 and 𝑤 to sum to one, which is more general. The linear weighted summation (LWS) model fits the neuronal responses to the bi-speed stimuli very well, accounting for an average of 91.8% (std = 7.2%) of the response variance across neurons. As suggested by the Reviewer, we now use the normalization model to fit the data with fixed weights across all motion directions. The normalization model also provides a good fit, accounting for an average of 90.5% (std = 7.1%) of the response variance across neurons.

      Note that in the new Figure 8A, at the left side of the tuning curve (i.e., at negative vector average (VA) directions), where the slower component moving in a more preferred direction of the neurons than the faster component, the bi-speed response (red curve) is slightly lower than the average of the component response (gray curve), indicating a bias toward the weaker faster component. Therefore, the faster speed bias does not occur only when the faster component moves in the more preferred direction. This can also be seen in the direction-tuning curves of an example neuron that we added to the figure (new Fig. 8B). The peak responses to the slower and faster component were about the same, but the neuron still showed a faster-speed bias. At negative VA directions, the red curve is lower than the response average (gray curve) and is biased toward the weaker (faster) component.  

      The faster-speed bias also occurs when the peak response to the slower component is stronger than the faster component. As a demonstration, Author response image 1 1 shows an example MT neuron that has a slow preferred speed (PS = 1.9 deg/s) and was stimulated by two speeds of 1.2 and 4.8 deg/s. The peak response to the faster component (blue) was weaker than that to the slower component (green). However, this neuron showed a strong bias toward the faster component. A normalization model fit with fixed weights for the faster and slower components (black curve) described the neuronal response to both speeds (red) well. This neuron was not included in the neuron population shown in Figure 8 because it was not tested with stimulus speeds of 2.5 and 10 deg/s.

      Author response image 1.

      An example MT neuron was tested with stimulus speeds of 1.2 and 4.8 deg/s. The preferred speed of this neuron was 1.9 deg/s. Fixed weights of 0.59 for the faster component and 0.12 for the slower component described the responses to the bispeed stimuli well using a normalization model. The neuron showed a faster-speed bias although its peak response to the slower component was higher than that of the faster component.

      We modified the text to clarify these points:

      Page 19, lines 405 – 410, “The bi-speed response was biased toward the faster component regardless of whether the response to the faster component was stronger (in positive VA directions) or weaker (in negative VA directions) than that to slower component (Fig. 8A). The result from an example neuron further demonstrated that, even when the peak firing rates of the faster and slower component responses were similar, the response elicited by the bi-speed stimuli was still biased toward the faster component (Fig. 8B). ”

      Page 19, lines 421 – 427, “Because the model can be well constrained by the measured direction-tuning curves, it is not necessary to require 𝑤 and 𝑤 to sum to one, which is more general. An implicit assumption of the model is that, at a given pair of stimulus speeds, the response weights for the slower and faster components are fixed across motion directions. The model fitted MT responses very well, accounting for an average of 91.8% of the response variance (std = 7.2%, N = 21) (see Methods). The success of the model supports the assumption that the response weights are fixed across motion directions.”

      Reviewer #2 (Public Review):

      Summary:

      This is a paper about the segmentation of visual stimuli based on speed cues. The experimental stimuli are random dot fields in which each dot moves at one of two velocities. By varying the difference between the two speeds, as well as the mean of the two speeds, the authors estimate the capacity of observers (human and non-human primates) to segment overlapping motion stimuli. Consistent with previous work, perceptual segmentation ability depends on the mean of the two speeds. Recordings from area MT in monkeys show that the neuronal population to compound stimuli often shows a bias towards the faster-speed stimuli. This bias can be accounted for with a computational model that modulates single-neuron firing rates by the speed preferences of the population. The authors also test the capacity of a linear classifier to produce the psychophysical results from the MT data.

      Strengths:

      Overall, this is a thorough treatment of the question of visual segmentation with speed cues. Previous work has mostly focused on other kinds of cues (direction, disparity, color), so the neurophysiological results are novel. The connection between MT activity and perceptual segmentation is potentially interesting, particularly as it relates to existing hypotheses about population coding.

      We thank the Reviewer for the summary and comments.

      Weaknesses:

      Page 10: The relationship between (R-Rs) and (Rf-Rs) is described as "remarkably linear". I don't actually find this surprising, as the same term (Rs) appears on both the x- and y-axes. The R^2 values are a bit misleading for this reason.

      The Reviewer is correct that subtracting a common term Rs from R and Rf would introduce correlation between (R-Rs) and (Rf-Rs). To address this concern, we conducted an additional analysis. We showed that, at most speed pairs, the R^2 values between (R-Rs) and (Rf-Rs) based on the data are significantly higher than the R^2 values between (R’-Rs) and (RfRs), in which R’ was a random combination of Rs and Rf. Since the same Rs was commonly subtracted in calculating R^2 (data) and R^2 (simulation), the difference between R^2 (data) and R^2 (simulation) suggests that the response pattern of R contributes to the additional correlation.

      We now acknowledge this confounding factor and describe the new analysis results on page 14, lines 309 – 326. Please also see the response to Reviewer 3 about a similar concern.

      Figure 9: I'm confused about the linear classifier section of the paper. The idea makes sense - the goal is to relate the neuronal recordings to the psychophysical data. However the results generally provide a poor quantitative match to the psychophysical data. There is mention of a "different paper" (page 26) involving a separate decoding study, as well as a preprint by Huang et al. (2023) that has better decoding results. But the Huang et al. preprint appears to be identical to the current manuscript, in that neither has a Figure 12, 13, or 14. The text also says (page 26) that the current paper is not really a decoding study, but the linear classifier (Figure 9F) is a decoder, as noted on page 10. It sounds like something got mixed up in the production of two or more papers from the same dataset.

      We apologize for the confusion regarding the reference of Huang et al. (2023, bioRxiv). We referred to an earlier version of this bioRxiv manuscript (version 1), which included decoding analysis. In the bibliography, we provided two URLs for this pre-print. While the second link was correct, the first URL automatically links to the latest version (version 2), which did not have the abovementioned decoding analysis.

      The analysis in Figure 9 is to apply a classifier to discriminate two-speed from singlespeed stimuli, which is a decoding analysis as the Reviewer pointed out. We revised the result section about the classifier to make it clear what the classifier can and cannot explain (pages 2223, lines 516-534). We also included a sentence at the end of this section that leads to additional decoding analysis to extract motion speed(s) from MT population responses (page 23, lines 541543), “To directly evaluate whether the population neural responses elicited by the bi-speed stimulus carry information about two speeds, it is important to conduct a decoding analysis to extract speed(s) from MT population responses.”

      In any case, I think that some kind of decoding analysis would really strengthen the current paper by linking the physiology to the psychophysics, but given the limitations of the linear classifier, a more sophisticated approach might be necessary -- see for example Zemel, Dayan, and Pouget, 1998. The authors might also want to check out closely related work by Treue et al. (Nature Neuroscience 2000) and Watamaniuk and Duchon (1992).

      We thank the Reviewer for the suggestion and agree that it is useful to incorporate additional decoding analysis that can better link physiology results to psychophysics. The decoding analysis we conducted was motivated by the framework proposed by Zemel, Dayan, and Pouget (1998), and also similar to the idea briefly mentioned in the Discussion of Treue et al. (2000). We have added the decoding analysis to this paper on pages 25-32.  

      What do we learn from the normalization model? Its formulation is mostly a restatement of the results - that the faster and slower speeds differentially affect the combined response. This hypothesis is stated quantitatively in equation 8, which seems to provide a perfectly adequate account of the data. The normalization model in equation 10 is effectively the same hypothesis, with the mean population response interposed - it's not clear how much the actual tuning curve in Figure 10A even matters, since the main effect of the model is to flatten it out by averaging the functions in Figure 10B. Although the fit to the data is reasonable, the model uses 4 parameters to fit 5 data points and is likely underconstrained; the parameters other than alpha should at least be reported, as it would seem that sigma is actually the most important one. And I think it would help to examine how robust the statistical results are to different assumptions about the normalization pool.

      In the linear weighted summation model (LWS) model (Eq. 8), the weights Ws and Wf are free parameters. We think the value of the normalization model (Eq. 9) is that it provides an explanation of what determines the response weights. We agree with the Reviewer that using the normalization model (Eq. 9) with 4 parameters to fit 5 data points of the tuning curves to bispeed stimuli of individual neurons is under-constrained. We, therefore, removed the section using the normalization model to fit overlapping stimuli moving in the same direction at different speeds.

      A better way to constrain the normalization model is to use the full direction-tuning curves of MT neurons in response to two stimulus components moving in different directions at different speeds, as shown in Figure 8. We now use the normalization model (Eq. 9) to fit this data set (also suggested by Reviewer 1), in addition to the LWS model. We now report the median values of the model parameters of the normalization model, including the exponent n, sigma, alpha, and the constant c. We also compared the normalization model fit with the linear summation (LWS) model. We discuss the limitations of our data set and what needs to be done in future studies. The revisions are on page 20, lines 434-467 in the Results, and pages 34-35, lines 818-829 in Discussion.

      Reviewer #3 (Public Review):

      Summary:

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

      Strengths:

      The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

      Weaknesses:

      My main concern is that the authors are missing an opportunity to make clear that the divisive normalization, while commonly used to describe neural response patterns in visual areas (and which fits the data here), fails on the theoretical front as an explanation for how information about multiple stimuli can be preserved. Thus, there is a bit of a disconnect between the goal of the paper - how does MT represent multiple stimuli? - and the results: mostly averaging responses which, while consistent with divisive normalization, would seem to correspond to the perception of a single intermediate speed. This is in contrast to the psychophysical results which show that subjects can at least distinguish one from two speeds. The paper would be strengthened by grappling with this conundrum in a head-on manner.

      We thank the Reviewer for the constructive comments. We agree with the Reviewer that it is important to connect the encoding of multiple speeds with the perception. The Reviewer also raised an important question regarding whether multiple speeds can be extracted from population neural responses, given the encoding rules characterized in this study.

      It is a hard problem to extract multiple stimulus values from the population neural response. Inspired by the theoretical framework proposed by Zemel et al. (1998), we conducted a detailed decoding study to extract motion speed(s) from MT population responses. We used the decoded speed(s) to perform a discrimination task similar to our psychophysics task and compared the decoder's performance with perception. We found that, at X4 speed difference, we could decode two speeds based on MT response, and the decoder's performance was similar to that of perception. However, at X2 speed difference, except at the slowest speeds of 1.25 and 2.5 deg/s, the decoder cannot extract two speeds and cannot differentiate between a bi-speed stimulus and a single log-mean speed stimulus. We have added the decoding analysis to this paper on pages 25-32. We also discuss the implications and limitations of these results (pages 35-36, lines 852-884).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Classifier:

      One question I have is how the classifier's performance scales with the number of neurons used in the analysis. Here that number is set to the number that was recorded, but it is a free parameter in this analysis. Why does the arbitrary choice of 100 neurons match the animals' performance?

      We apologize for the unclearness of this point. The decoding using the classifier was based on the neural responses of 100 recorded MT neurons in our data set. The number of 100 neurons was not a free parameter. We need to reconstruct the population neural response based on the responses of the recorded neurons and their preferred speeds (red and black dots in Figure 9A-E).  

      We spline-fitted the reconstructed population neural responses (red and black curves in Figure 9-E). One way to change the number of neurons used for the decoding is to resample N points along the spline-fitted population responses, using N as a free parameter. However, we think it is better to conduct decoding based on the responses from the recorded neurons rather than based on interpolated responses. We now clarify on page 22, lines 520-522, that we based on the responses of the 100 recorded neurons in our dataset to do the classification (decoding).

      Normalization Model:

      Although the model is phenomenological, a schematic circuit diagram could help the reader understand how this could work (I think this is worthwhile even though the data cannot distinguish among different implementations of divisive normalization).

      Thanks for this suggestion. We agree that a circuit diagram would help the readers understand how the model works. However, as the Reviewer pointed out, our data cannot distinguish between different implementations of the model. For example, divisive normalization can occur on the inputs to MT neurons or on MT neurons themselves. The circuit mechanism of weighting the component responses is not clear either. A schematic circuit diagram then mainly serves to recapitulate the normalization model in Equation 9. We, therefore, choose not to add a schematic circuit diagram at this time. We are interested in developing a circuit model to account for how visual neurons represent multiple stimuli in future studies.

      Another suggestion is that the time courses could be used to constrain the model; the fact that it takes a while after the onset of the slow-speed response for averaging to reveal itself suggests the presence of inertia/hysteresis in the circuit).

      We agree that the time course of MT responses could be used to constrain the model. This is also why we think it is important to document the time course in this paper. We now state in the Results, page 17, lines 354-357:

      “At slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bispeed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.”

      Two-Direction Experiment:

      Applying the normalization model to this dataset could help determine its generality.

      This is a good point. We now apply the normalization model (Eq. 9) to fit this data set with the full direction tuning curves in response to two stimuli moving in different directions at different speeds. Please also see the response to Reviewer 2 about the normalization model fit.

      The results of the normalization model fit are now described on page 20 and Figure 8A, B, D.

      Reviewer #2 (Recommendations For The Authors):

      In terms of impact, I would say that the presentation is geared largely toward people who go to VSS. To broaden the appeal, the authors might consider a more general formulation of the four hypotheses stated at the bottom of page 3. These are prominent ideas in systems neuroscience - population encoding, Bayesian inference, etc.

      We thank the Reviewer for the suggestion. We have revised the Introduction accordingly on pages 3-4, lines 43-69. Please also see the response to Reviewer 3 about the Introduction.

      Figure 5: It might be helpful to show the predictions for different hypotheses. If the response to the transparent stimulus is equal to that of the faster stimulus, you will have a line with slope 1. If it is equal to the response to the slow stimulus, all points will lie on the x-axis. In between you get lines with slopes less than 1.

      In Figures 5F1 and 5F2, we show dotted lines indicating faster-all (i.e., faster-componenttake-all), response averaging, and slower-all (i.e., slower-component-take-all) on the X-axis. We show those labels in between Figs. 5F1 and F2.

      Figure 6: The analysis is not motivated by any particular question, and the results are presented without any quantitation. This section could be better motivated or else removed.

      We now better motivate the section about the response time course on page 16, lines 336 – 339: “The temporal dynamics of the response bias toward the faster component may provide a useful constraint on the neural model that accounts for this phenomenon. We therefore examined the timecourse of MT response to the bi-speed stimuli. We asked whether the faster-speed bias occurred early in the neuronal response or developed gradually.”

      On page 17, lines 354-357, we also state that “At slow speeds, the very early faster-speed bias suggests a likely role of feedforward inputs to MT on the faster-speed bias. The slightly delayed reduction (normalization) in the bi-speed response relative to the stronger component response also helps constrain the circuit model for divisive normalization.”

      Equation (9): There appears to be an "S" missing in the denominator.

      We double-checked and did not see a missing "S" in Equation 9, on page 20.  

      Reviewer #3 (Recommendations For The Authors):

      This is an impressive study, with the chief strengths being the computational/theoretical motivation and analyses and the inclusion of psychophysics together with primate neurophysiology. The manuscript is well-written and the figures are clear and convincing (with a couple of suggestions detailed below).

      We thank the Reviewer for the comments.

      Specific suggestions:

      (1) Intro para 3

      "It is conceivable that the responses of MT neurons elicited by two motion speeds may follow one of the following rules: (1) averaging the responses elicited by the individual speed components; (2) bias toward the speed component that elicits a stronger response, i.e. "soft-max operation" (Riesenhuber and Poggio, 1999); (3) bias toward the slower speed component, which may better represent the more probable slower speeds in nature scenes (Weiss et al., 2002); (4) bias toward the faster speed component, which may benefit the segmentation of a faster-moving stimulus from a slower background."

      This would be a good place to point out which of these options is likely to preserve vs. lose information and how.

      It seems to me that only #2 is clearly information-preserving, assuming that there are neurons with a variety of different speed preferences such that different neurons will exhibit different "winners". #1 would predict subjects would perceive only an intermediate speed, whereas #3 would predict perceiving only/primarily the slower speed and #4 would predict only/primarily perceiving the faster speed.

      The difference between "only" and "primarily" would depend on whether the biases are complete or only partial. I acknowledge that the behavioral task in the study is not a "report all perceived speeds" task, but rather a 1 vs 2 speeds task, so the behavioral assay is not a direct assessment of the question I'm raising here, but I think it should still be possible to write about the perceptual implications of these different possibilities for encoding in an informative way.

      Thanks for the suggestions. We have revised this paragraph in the Introduction on pages 3 – 4, lines 43 – 69.

      (2) Analysis clarifications

      The section "Relationship between the responses to bi-speed stimuli and constituent stimulus components" could use some clarification/rearrangement/polish. I had to read it several times. Possibly, rearrangement, simplification/explanation of nomenclature, and building up from a simpler to a more complex case would help. If I understand correctly, the outcome of the analysis is to obtain a weight value for every combination of slow and fast speeds used. The R's in equation 5 are measured responses, observed on the single stimulus and combined stimulus trials. It was not clear to me if the R's reflect average responses or individual trial responses; this should be clarified. Ws = 1- wf so in essence only 1 weight is computed for each combination. Then, in the subsequent sections of the manuscript, the authors explore whether the weight computed for each stimulus combination is the same or does it vary across conditions. If I have this right, then walking through these steps will aid the reader.

      The Reviewer is correct. We now walk through these steps and better state the rationale for this approach. The R's in Equation 5 are trial-averaged responses, not trial-by-trial responses.

      We have clarified these points on page 13.

      To take a particular example, the sentence "Using this approach to estimate the response weights for individual neurons can be inaccurate because, at each speed pair, the weights are determined only by three data points" struck me as a rather backdoor way to get at the question. Is the estimate noisy? Or does the weighting vary systematically across speeds? I think the authors are arguing the latter; if so, it would be valuable to say so.

      We wanted to estimate the weighting for each speed pair and determine whether the weights change with the stimulus speeds. Indeed, we found that the weights change systematically across speed pairs. The issue was not because the estimate was noisy (see below in response to the second paragraph for point 3.  

      We have clarified this point in the text, on page 13, lines 273 – 280: “Our goal was to estimate the weights for each speed pair and determine whether the weights change with the stimulus speeds. In our main data set, the two speed components moved in the same direction. To determine the weights of 𝑤 and w<sub>f</sub> for each neuron at each speed pair, we have three data points R, R<sub>s</sub>, and R<sub>f</sub>, which are trial-averaged responses. Since it is not possible to solve for both variables, 𝑤 and w<sub>f</sub>, from a single equation (Eq. 5) with three data values, we introduced an additional constraint: 𝑤 + w<sub>f</sub> =1. While this constraint may not yield the exact weights that would be obtained with a fully determined system, it nevertheless allows us to characterize how the relative weights vary with stimulus speed.”

      (3) Figure 5

      Related to the previous point, Figures 5A-E are subject to a possible confound. When plotting x vs y values, it is critical that the x and y not depend trivially on the same value. Here, the plots are R-Rs and Rf-Rs. Rs, therefore, is contained in both the x and y values. Assume, for the sake of argument, that R and Rf are constants, whereas Rs is drawn from a distribution of random noise. When Rs, by chance, has an extreme negative value, R-Rs and Rf-Rs will be large positive values. The solution to this artificial confound is to split the trials that generate Rs into two halves and subtract one half from R and the other half from Rf. Then, the same noisy draw will not be contributing to both x and y. The above is what is needed if the authors feel strongly about including this analysis.

      The Reviewer is correct that subtracting a common term (Rs) would introduce a correlation between (R-Rs) and (Rf-Rs) (Reviewer 2 also raised this point). R's in Equations 5, 6, 7 (and Figure 5A-E) are trial-averaged responses. So, we cannot address the issue by dividing R’s into two halves. Our results showed that the regression slope (W<sub>f</sub>) changed from near 1 to about 0.5 as the stimulus speeds increased, and the correlation coefficient between (R – Rs) and (R<sub>f</sub> – Rs) was high at slow stimulus speeds. To determine whether these results can be explained by the confounding factor of subtracting a common term Rs, rather than by the pattern of R in representing two speeds, we did an additional analysis. We acknowledged the issue and described the new analysis on page 13, lines 303 – 326:

      “Our results showed that the bi-speed response showed a strong bias toward the faster component when the speeds were slow and changed progressively from a scheme of ‘fastercomponent-take-all’ to ‘response-averaging’ as the speeds of the two stimulus components increased (Fig. 5F1). We found similar results when the speed separation between the stimulus components was small (×2), although the bias toward the faster component at low stimulus speeds was not as strong as x4 speed separation (Fig. 5A2-F2 and Table 1).  

      In the regression between (𝑅 – 𝑅<sub>s</sub>) and (𝑅<sub>f</sub> – 𝑅<sub>s</sub>), 𝑅<sub>s</sub> was a common term and therefore could artificially introduce correlations. We wanted to determine whether our estimates of the regression slope (𝑤<sub>f</sub>) and the coefficient of determination (𝑅<sup>2</sup>) can be explained by this confounding factor. At each speed pair and for each neuron from the data sample of the 100 neurons shown in Figure 5, we simulated the response to the bi-speed stimuli (𝑅 <sub>e</sub>) as a randomly weighted sum of 𝑅<sub>f</sub> and 𝑅<sub>s</sub> of the same neuron.

      𝑅<sub>e</sub> = 𝑎𝑅<sub>f</sub> + (1 − 𝑎)𝑅<sub>s</sub>,

      in which 𝑎 was a randomly generated weight (between 0 and 1) for 𝑅<sub>f</sub>, and the weights for 𝑅<sub>f</sub> and 𝑅<sub>s</sub> summed to one. We then calculated the regression slope and the correlation coefficient between the simulated 𝑅<sub>e</sub> - 𝑅<sub>s</sub> and 𝑅<sub>f</sub> - 𝑅<sub>s</sub> across the 100 neurons. We repeated the process 1000 times and obtained the mean and 95% confidence interval (CI) of the regression slope and the 𝑅<sup>2</sup>. The mean slope based on the simulated responses was 0.5 across all speed pairs. The estimated slope (𝑤<sub>f</sub>) based on the data was significantly greater than the simulated slope at slow speeds of 1.25/5, 2.5/10 (Fig. 5F1), and 1.25/2.5, 2.5/5, and 5/10 degrees/s (Fig. 5F2) (bootstrap test, see p values in Table 1). The estimated 𝑅<sup>2</sup> based on the data was also significantly higher than the simulated 𝑅<sup>2</sup> for most of the speed pairs (Table 1). These results suggest that the faster-speed bias at the slow stimulus speeds and the consistent response weights across the neuron population at each speed pair are not analysis artifacts.”

      However, I don't see why the analysis is needed at all. Can't Figure 5F be computed on its own? Rather than computing weights from the slopes in 5A-E, just compute the weights from each combination of stimulus conditions for each neuron, subject to the constraint ws=1-wf. I think this would be simpler to follow, not subject to the noise confound described in the previous point, and likely would make writing about the analysis easier.

      We initially tried the suggested approach to determine the weights of the individual neurons. The weights from each speed combination for each neuron are calculated by:  𝑤<sub>s</sub> = , 𝑤<sub>f</sub> , and 𝑤<sub>s</sub> and 𝑤<sub>f</sub> sum to 1. 𝑅, 𝑅<sub>f</sub> and  𝑅<sub>s</sub> are the responses to the same motion direction. Using this approach to estimate response weights for individual neurons can be unreliable, particularly when 𝑅<sub>f</sub> and 𝑅<sub>s</sub> are similar. This situation often arises when the two speeds fall on opposite sides of the neuron's preferred speed, resulting in a small denominator (𝑅<sub>f</sub> - 𝑅<sub>s</sub>) and, consequently, an artificially inflated weight estimate. We therefore used an alternative approach. We estimated the response weights for the neuronal population at each speed pair (𝑅<sub>f</sub> - 𝑅<sub>s</sub>) using linear regression of (𝑅 - 𝑅<sub>s</sub>) against (𝑅<sub>f</sub> - 𝑅<sub>s</sub>). The slope is the weight for the faster component for the population. This approach overcame the difficulty of determining the response weights for single neurons.

      Nevertheless, if the data provide better constraints, it is possible to estimate the response weights for each speed pair for individual neurons. For example, we can calculate the weights for single neurons by using stimuli that move in different directions at two speeds. By characterizing the full direction tuning curves for R, R<sub>f</sub>, and Rs, we have sufficient data to constrain the response weights for single neurons, as we did for the speed pair of 2.5 and 10º/s in Figure 8. In future studies, we can use this approach to measure the response weights for single neurons at different speed pairs and average the weights across the neuron population.  

      We explain these considerations in the Results (pages 13–14, lines 265-326) and Discussion (pages 34-35, lines 818-829).

      (4) Figure 7

      Bidirectional analysis. It would be helpful to have a bit more explanation for why this analysis is not subject to the ws=1-wf constraint. In Figure 7B, a line could be added to show what ws + wf =1 would look like (i.e. a line with slope -1 going from (0,1) to (1,0); it looks like these weights are a little outside that line but there is still a negative trend suggesting competition.

      For the data set when visual stimuli move in the same direction at different speeds, we included a constraint that W<sub>s</sub> and W<sub>f</sub> sum to 1. This is because one cannot solve two independent variables (Ws and Wf) using one equation R = W<sub>s</sub> · R<sub>s</sub> + W<sub>f</sub> R<sub>f</sub>, with three data values (R, Rs, Rf).

      In the dataset using bi-directional stimuli (now Fig. 8), we can use the full direction tuning curves to constrain the linear weighted (LWS) summation model and the normalization model. So, we did not need to impose the additional constraint that Ws and Wf sum to one, which is more general. We now clarify this in the text, on page 19, lines 421-423.

      As suggested, we added a line showing Ws + Wf = 1 for the LWS model fit (Fig. 8C) and the normalization model fit (Fig. 8D) (also see page 21, lines 482-484). Although 𝑤 and 𝑤 are not constrained to sum to one in the model fits, the fitted weights are roughly aligned with the dashed lines of Ws + Wf = 1.

      (5) Attention task

      General wording suggestions - a caution against using "attention" as a causal/mechanistic explanation as opposed to a hypothesized cognitive state. For example, "We asked whether the faster-speed bias was due to bottom-attention being drawn toward the faster stimulus component". This could be worded more conservatively as whether the bias is "still present if attention is directed elsewhere" - i.e. a description of the experimental manipulation.

      We intended to test the hypothesis of whether the faster-speed bias can be explained by attention automatically drawn to the faster component and therefore enhance the contribution of the faster component to the bi-speed response. We now state it as a possible explanation to be tested. We changed the subtitle of this section to be more conservative: “Faster-speed bias still present when attention was directed away from the RFs”, on page 18, line 363.

      We also modified the text on page 18, lines 364-367: “One possible explanation for the faster-speed bias may be that bottom-up attention is drawn toward the faster stimulus component, enhancing the response to the faster component. To address this question, we asked whether the faster-speed bias was still present if attention was directed away from the RFs.”

      Relatedly, in the Discussion, the section on "Neural mechanisms", the sentence "The faster-speed bias was not due to an attentional modulation" should be rephrased as something like 'the bias survived or was still present despite an attentional modulation requiring the monkey to attend elsewhere'.

      Our motivation for doing the attention-away experiment was to determine whether a bottom-up attentional modulation can explain the faster-speed bias. We now describe the results as suggested by the Reviewer. But we’d also like to interpret the implications of the results. In Discussion, page 34, lines 789-790, we now state: “We found that the faster-speed bias was still present when attention was directed away from the RFs, suggesting that the faster-speed bias cannot be explained by an attentional modulation.”  

      (6) "A model that accounts for the neuronal responses to bi-speed stimuli". This section opens with: "We showed that the neuronal response in MT to a bi-speed stimulus can be described by a weighted sum of the neuron's responses to the individual speed components". "Weighted average" would be more appropriate here, given that ws = 1-wf.

      As mentioned above, the added constraint of Ws+Wf = 1 was only a practical solution for determining the weights for the data set using visual stimuli moving in the same direction. More generally, Ws and Wf do not need to sum to one. As such, we prefer the wording of weighted sum.

      (7) "As we have shown previously using visual stimuli moving transparently in different directions, a classifier's performance of discriminating a bi-directional stimulus from a singledirection stimulus is worse when the encoding rule is response-averaging than biased toward one of the stimulus components" - this is important! Can this be worked into the Introduction?

      Yes, we now also mention this point in the Introduction regarding response averaging on page 4, lines 54-57: “While decoding two stimuli from a unimodal response is theoretically possible (Zemel et al., 1998; Treue et al., 2000), response averaging may result in poorer segmentation compared to encoding schemes that emphasize individual components, as demonstrated in neural coding of overlapping motion directions (Xiao and Huang, 2015).” Also, please see the response to point 1 above.

      (8) Minor, but worth catching now - is the use of initials for human participants consistent with best practices approved at your institution?

      Thanks for checking. The letters are not the initials of the human subjects. They are coded characters. We have clarified it in the legend of Figure 1, on page 7, line 168.

    1. Author Response

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      I also suggest the manuscript should be written in a way that is more accessible to readers who are less familiar with animal experiments. In addition, the implementation and interpretation of brain simulations need to be more careful and clear.

      Several sections of the manuscript were clarified and simplified to be more accessible. Also, implementation and interpretations of brain simulations were modified to be more precise.

      Strengths:

      1) ZTE imaging sequence was selected over traditional EPI sequence as the optimal way to perform fMRI experiments during absence seizures.

      2) A detailed classification of stimulation periods is achieved based on the relative position in time of the stimulation period with respect to the brain state.

      3) A whole-brain model embedded with a realistic rat connectome is simulated on the TVB platform to replicate fMRI observations.

      We thank the reviewer for indicating the strengths of our manuscript.

      Weaknesses:

      1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: “The mechanism underlying the reduced responsiveness to external stimulus remains unknown.” was therefore modified to the following “The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown”.

      2) The implementations of brain simulations need to be more specific.

      Contribution:

      The contribution of this paper is performing fMRI experiments under a rare condition that could provide fresh knowledge in the imaging field regarding the brain's responsiveness to environmental stimuli during absence seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible effect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion.

      We thank the reviewer for noting this discrepancy. The statement should have been written vice versa and it has been corrected as: “When comparing statistical responses between both states, significant changes (p<0.05, cluster-level corrected) were noticed in the somatosensory, auditory and frontal cortices: these regions were less activated in ictal than in interictal state (see also Figure 4).”

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time.

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      Hemodynamic response functions were studied for two reasons:

      • To account for possible change in HRF during the detection of activated regions. Indeed, a physiological change in HRF can mask the detection of an activation when the software uses a standard HRF to convolve the design matrix (David et al. 2008).

      • To characterize the shape and polarity of fMRI activations in brain regions that we noticed to be differently activated between ictal and interictal states and evaluate whether alteration in activation was associated to alteration in hemodynamic.

      The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression caused by SWD can prevent responsiveness. In this case, the decreased HRF could either be a consequence or a cause of the observed neuronal suppression. The assumption that the HRF reduction is causal would be supported by a possible vascular steal effect from other activation regions. However, in the conclusion section we did not state this and therefore the following sentence was added to conclusions: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results are unclear.

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. However the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with the potential to yield important insights.

      The use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and auditory stimuli was 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. However the attempts to localize these differences in space or time will be contaminated by the seizure-related signals.

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs. interictal are unconvincing due to the above.

      We understand this concern expressed by the reviewer and agree that seizure-related signals must be considered in the analysis when studying stimulation responses. Therefore, in modelling the responses in the SPM framework, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be, in theory, separated as much as possible from the effects caused by the seizure itself. Additionally, the cases where stimulations occurred fully inside a seizure (included in Figure 3, “...stimulation during ictal state) actually had a longer average seizure duration of 45 ± 60 s, therefore being much longer than 6s which an average duration taken from all seizures.

      However, we acknowledge that there is a potential that some leftover effects from a seizure are still present, and we have noted this caution in the “Physiologic and methodologic considerations” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination, we considered in SPM both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the seizure itself should be separated as much as possible from the effects caused by stimulation.”

      The maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      We clarified the overall appearance of Figure 3, by enlarging the selected cross sections for better anatomical differentiation and added anterior and posterior directions on all images.

      Reviewer #1 (Recommendations For The Authors):

      1) The implementations of brain simulations need to be more specific: How is the stimulation applied in the mean-field model in terms of its mathematical expression? The state variable of the model is the rate of neuronal firing, but how is it subsequently converted into fMRI responses? How are the statistical plots calculated? How much does this result depend on the model parameter?

      Further details and explanations about the model have now been added to the manuscript. The stimulation of a specific region is simulated as an increase in the excitatory input to the specific node. In particular we use a square function for representing the stimulus (see for example panel A in Figure 6–figure supplement 1). As the referee mentions, the model describes the dynamics of the neuronal firing rates. This provides direct information about neuronal activity and responsiveness for which all the statistical analyses of the simulations shown in the paper were performed using the firing rates. For these analyses, no conversion to fMRI was needed. To build the statistical maps, an ANOVA (analysis of variance) test was used. The ANOVA test is originally designed to assess the significance of the change in the mean between two samples, and is calculated via an F-test as the ratio of the variance between and within samples. In our case it allowed us to assess the impact of the stimulation on the ongoing neuronal activity by performing a comparison of the timeseries of the firing rate with and without stimulation (this was performed independently for each state). For the results presented in this paper, the ANOVA analysis was performed using the “f_oneway” function of the scipy.stats. module in python. Regarding the dependence on the model parameter, the main results obtained in our paper are related with the responsiveness of the system under two quantitatively different types of ongoing dynamics: an asynchronous irregular activity (interictal period) and an oscillatory SWD type of dynamics (ictal period). In particular, we show how for the SWD dynamics the activity evoked by the stimulus is overshadowed by the ongoing activity which imposes a strong limitation in the response of the system and the propagation of the stimulus. In this sense, the main results of the simulations are very general, and no significant dependence on specific cellular or network parameters was observed within a physiologically relevant range or should be expected. Nevertheless, we point out that, as mentioned in the text, the key parameter that triggers the transition between the two types of dynamics is the strength of the adaptation current (in particular the strength of the spike-triggered adaptation parameter ‘b’ described in the Supplementary information), which in addition has the capacity of controlling the frequency of the oscillations. In the paper, this parameter was set such that the SWD frequency falls within the range observed in the GAERS (between 7-12Hz). We believe that further analysis around the region of transition between states, in particular from a dynamical point of view, could be of relevance for future work.

      2) In the abstract, what exactly does "typical information flow in functional pathways" mean and which part of the results does this refer to?

      We note that this sentence was overly complicated. By “typical information flow”, we were referring to sensory responsiveness during interictal state. Therefore, we made the following modifications to the abstract: “These results suggest that sensory processing observed during an interictal state can be hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness.”

      3) Figure 4 - Figure Supplement 1 performed an analysis of comparing states between 'when stimulation ended a seizure' and 'stimulation during an ictal period'. The authors should explain more clearly in the manuscript what is the reason and significance of considering the state of 'when stimulation ended a seizure'. And how is a seizure considered to be terminated by stimulation rather than ending spontaneously?

      We have now added explanations to the manuscript section 2.5.3 as why this state was also of interest: “The case when stimulation ended a seizure is particularly interesting for studying the spatial and temporal aspects explaining shift from ictal, i.e. non-responsiveness state, to non-ictal, i.e. responsiveness state.” We agree that there is a possibility that seizures ended spontaneously at the same time as stimulus was applied but argue that seizures most probably end due to stimulation, based on results published previously (https://doi.org/10.1016/j.brs.2012.05.009).

      4) In Section 3.1, some detailed descriptions of methods should be moved to Section 2, e.g. how the spatial and temporal SNR is obtained and the description of bad quality data. Also, I suggest the significance of selecting the optimal MRI sequence be stated earlier in the paper, as Section 3.1 cannot be expected from reading the abstract and introduction.

      We moved some technical explanations of SNRs from section 3.1. to section 2.4.1. Significance of the selection of the MRI sequence is also now stated earlier in the introduction section: “For this purpose, the functionality of ZTE sequence was first piloted, and selected over traditional EPI sequence for its lower acoustic noise and reduced magnetic susceptibility artefacts. The selected MRI sequence thus appeared optimal for awake EEG-fMRI measurements.”

      Some minor issues:

      1) How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following: “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      2) Section 4.3.2, "In addition, some responses were seen in the somatosensory cortex during the seizure state, which may be due to the fact that the linear model used did not completely remove the effect of the seizure itself" What is the reason for the authors to make such comments?

      This claim was made because we saw similar trend of responses (deactivation) in F-contrast maps in the somatosensory cortex, when comparing “stimulation during ictal state” maps to "seizure map", leading us to assume that the effect of seizure was still apparent in the maps (even though “seizure only” states were used as nuisance regressors). However, as this claim is highly speculative, we have decided to delete this sentence in the manuscript.

      3) Abbreviations such as SPM, HRF, CBF, etc. are not defined in the manuscript.

      Definitions for these abbreviations were added.

      4) Supplementary information-AdEx mean-field model, 've and vi', e and i should be subscripted.

      Subscripts were added.

      Reviewer #2 (Recommendations For The Authors):

      Below are more detailed questions and concerns. Many questions are about the Methods, which seem to be written by a specialist. However, there are also questions about the experimental approach and conclusions.

      One of the strengths of the study is the use of fMRI and EEG. However, to allow rats to be still in the magnet, isoflurane was used, and then as soon as rats recovered they were imaged. However isoflurane has effects on the brain long after the rats have appeared to wake up. Moreover, to train rats to be still, repetitive isoflurane sessions had to be used. Repetitive isoflurane should have a control of some kind, or be discussed as a limitation.

      The repetitive use of isoflurane is indeed an important limiting factor that was not yet discussed in the manuscript. We have added the following sentences to the “Physiologic and methodologic considerations” section:

      “As the used awake habituation and imaging protocol didn’t allow us to avoid the usage of isoflurane during the preparation steps, we cannot rule out the possible effect of using repetitive anesthesia on brain function. However, duration (~15 min) and concentration of anesthesia (~1.5%) during these steps were still moderate, whereas extended durations (1-3 h) of either single or repetitive isoflurane exposures have been used in previous studies where long-term effects on brain function have been observed (Long II et al., 2016; Stenroos et al., 2021). Moreover, there was a 5-15 min waiting period between the cessation of anesthesia and initiation of fMRI scan, to avoid the potential short-term effects of isoflurane that has been found to be most prominent during the 5 min after isoflurane cessation (Dvořáková et al., 2022).

      An assumption of the study is that interictal periods are normal. However, they may not be. A control is necessary. One also wants to know how often GAERS have spontaneous spike-wave discharges (SWDs), what the authors call seizures. The reason is that the more common the SWDs, the less likely interictal periods are normal. It seems from the Methods that rats were selected if they had frequent seizures so many could be captured in a recording session. Those without frequent seizures were discarded.

      A good control would be a normal rat that has spontaneous SWDs, since almost all rat strains have them, especially with age and in males (PMID: 7700522). However, whether they are frequent enough might be a problem. Alternatively, animals could be studied with rare seizures to assess the normal baseline, and compared to interictal states in GAERS.

      We appreciate this concern raised by the Reviewer. Even though it would be interesting to study different strains and SWD frequency dependence, the aim of this study was to compare interictal vs ictal states in this specific animal model. We also understand that interictal periods could not necessarily model “normal” state and therefore went through the manuscript again to remove any claims referring to this.

      About the mechanisms of SWDs, the authors should update their language which seems imprecise and lacks current citations (starting on line 71):

      "Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from atypical excitatory-inhibitory patterns in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007) and lead to synchronous cortico-thalamic activity (Holmes, Brown, and Tucker 2004)."

      Some of the best explanations for SWDs that I know of are from the papers of John Huguenard. His reviews are excellent. They discuss the mechanisms of thalamocortical oscillations.

      We have reformatted the sentences discussing the mechanism of SWDs and included the explanations provided by manuscripts from Huguenard and McCafferty et al.: “Although the origin of absence seizures is not fully understood, current studies on rat models of absence seizures suggest that they arise from excitatory drive in the barrel field of the somatosensory cortex (Meeren et al. 2002; Polack et al. 2007, 2009, David et al., 2008) and then propagate to other structures (David et al., 2008) including thalamus, knowing to play an essential role during the ictal state (Huguenard, 2019). Notably, the thalamic subnetwork is believed to play a role in coordinating and spacing SWDs via feedforward inhibition together with burst firing patterns. These lead to the rhythms of neuronal silence and activation periods that are detected in SWD waves and spikes (McCafferty et al., 2018; Huguenard, 2019).”

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)." What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Lines 85-8: "In addition, a previous fMRI study on GAERS, which measured changes in cerebral blood volume, found both deactivated and activated brain areas during seizures (David et al. 2008). Which areas and conditions led to reduced activity? Increased activity? How was it surmised?

      "concurrent stimuli and therefore could contribute to the alterations in behavioral responsiveness" - This idea has been raised before by others (Logthetis, Barth). Please discuss these as the background for this study.

      The particular section was modified to the following:

      “Previous results on GAERS have indicated that, during an absence seizure, hyperactive electrophysiological activity in the somatosensory cortex can contribute to bilateral and regular SWD firing patterns in most parts of the cortex. These patterns propagate to different cortical areas (retrosplenial, visual, motor and secondary sensory), basal ganglia, cerebellum, substantia nigra and thalamus (David et al. 2008; Polack et al. 2007). Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al. 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023). In line with these findings, graph-based analyses have shown an increased segregation of cortical networks from the rest of the brain (Wachsmuth et al. 2021). Altogether, alterations in these focal networks in the animal models of epilepsy impairs cognitive capabilities needed to process specific concurrent stimuli during SWD and therefore could contribute to the lack of behavioral responsiveness (Chipaux et al. 2013; Luo et al. 2011; Meeren et al. 2002; Studer et al. 2019), although partial voluntary control in certain stimulation schemes can be still present (Taylor et al., 2017).”

      Please discuss the mean-field model more. What are its assumptions? What is its validation? Do other models also provide the same result?

      We have now extended the discussion and explanation of the mean-field model, both in the main text and in the Supplementary information. The mean-field model is a statistical tool to estimate the mean activity of large neuronal populations, and as such its main assumptions are centered around the size of the population analyzed and the characteristic times of the neuronal dynamics under study. It has been shown that the formalism is valid for characteristic times of neuronal dynamics with a lower bond in the order of few milliseconds and with population size of in the order thousands of neurons (see El Boustani and Destexhe, Neural computation 2009; and Di Volo et al, Neural computation 2019), with both conditions satisfied in the simulations made for this work. Regarding the validation, the model has been extensively validated and used for simulating different brain states (Di Volo et al. 2009; Goldman et al. 2023), signal propagation in cortical circuits (Zerlaut et al, 2018) and to perform whole-brain simulations (Goldman et al, 2023). The standard validation of the mean-field implies its comparison with the activity obtained from the corresponding spiking neural network. For completeness we show in Author response image 1 an example of the SWD type of dynamics obtained from a spiking neural network together with the one obtained from the mean-field. This figure has been added now to the Supplementary information of the paper. Regarding the extension of the results to other models, we think that the generality of our results is an interesting point from our work. The main results obtained from our simulation are related with the responsiveness of the system during two different type of ongoing activity: in the interictal state there is a significant variation on the ongoing activity evoked by the stimulation that is propagated to other regions, while in the SWD state the evoked activity is overshadowed by the ongoing activity which imposes a strong limit to the responsiveness of the system and the propagation of the signal. In this sense, the results of the simulations are very general and should be extensible to other models. Of course, the advantage of using a model like ours is the capability of reproducing the different states, its applicability to large scale simulations, and the fact that it is built from biologically relevant single-cell models (AdEx).

      Author response image 1.

      Comparison of the SWD dynamics in the mean-field model and the underlying spiking-neural network of AdEx neurons. A) Raster plot (top) and mean firing rate (bottom) from an SWD type of dynamics obtained from the spiking- network simulations. The network is made of 8000 excitatory neurons and 2000 inhibitory neurons. Neurons in the network are randomly connected with probability p=0.05 for inhibitory-inhibitory and excitatory-inhibitory connections, and p=0.06 for excitatory-excitatory connections. Cellular parameters correspond to the ones used in the mean-field, with spike-triggered adaptation for excitatory neurons set to b=200pA. We show the results for excitatory (green) and inhibitory (red) neurons. B) Mean-firing rate obtained from a single mean-field model. We see that, although the amplitude of oscillations is larger in the spiking-network, the mean-field can correctly capture the general dynamics and frequency of the oscillations.

      Line 11: "rats were equally divided by gender." Given n=11, does that mean 5 males and 6 females or the opposite?

      Out of 11 animals, 6 were males, and 5 females. This is now mentioned in the manuscript.

      What was the type of food?

      Type of food was added to the manuscript (Extrudat, vitamin-fortified, irradiated > 25 kGy)

      What were the electrodes?

      This was provided in the manuscript. Carbon fiber filament was produced by World Precision Instruments. The tips of this filament were spread to brush-like shape to increase the contact surface above the skull.

      "low noise zero echo time (ZTE) MRI sequence"- please explain for the non-specialist or provide references.

      Reference added.

      Lines 148-150: "The length of habituation period was selected based on pilot experiments and was sufficient for rats to be in low-stress state and produce absence seizures inside the magnet." How do the authors know the rats were in a low-stress state?

      This claim was based on two factors. At the end of the habituation protocol, the motion of animals was considerably decreased according to previous study using similar restraint/habituation protocol (DOI: 10.3389/fnins.2018.00548). In this study the decreased motion is also correlated with decreased blood corticosterone levels which reduced to baseline levels (indicating low-stress state) after 4 days of habituation. Another factor is when epileptic rodents are continuously recorded for 24h, most SWDs occur during a state of passive wakefulness or drowsiness (Lannes et al. 1988, Coenen et al. 1991) . Either way, as we don’t have a way to provide direct evidence of low-stress state, we modified the sentence to the following:

      “The length of habituation period was selected based on pilot experiments to provide low-motion data therefore giving rats a better chance to be in a low-stress state and thus produce absence seizures inside the magnet.”

      Lines 150-2: "Respiration rate and motion were monitored during habituation sessions using a pressure pillow and video camera to estimate stress level." What were the criteria for a high stress level?

      Criteria for high (or low) stress levels were based mostly on motion levels according to previous study (DOI: 10.1016/s0149-7634(05)80005-3). Still, as we didn’t measure direct measures of stress, we modified the sentence to the following:

      “Pressure pillow and video camera were used to estimate physiological state, via breathing rate, and motion level, respectively.”

      Lines 152-3: "During the last habituation session, EEG was measured to confirm that the rats produced a sufficient amount of absence seizures (10 or more per session)." If 10 min, the rats would basically be seizing the entire session, leading to doubt about what the interictal state was.

      The length of the last habituation session was 60min and the fMRI scan 45min. Given that rats produced ~40-50 seizures during fMRI scan, on average they produced ~1 seizures/min, and one seizure lasting on average of 5-6s, giving ~45s periods for interictal states. 10 or more seizures were used as a threshold to give statistically meaningful findings based on pilot experiments.

      Line 153: "Total of 2-5 fMRI experiments were conducted per rat within a 1-3-week period." What was the schedule for each animal? A table would be useful. If it varied, how do the authors know this was justified?

      Please see Figure 1–figure supplement 2 for examples of habituation timelines for individual rats:

      We found an error when stating 2-5 fMRI experiments, but it should be 3-5 fMRI experiments. This was corrected. We had an aim to acquire 12-14 sessions per stimulation condition and once a sufficient number of sessions were acquired, part of the animals was not used further. Two of the animals that were found to have good quality EEG and produced sufficient amounts of SWDs were kept, and briefly retrained for later second stimulation condition experiments. This was done to replace animals that needed to be excluded in the second stimulation condition due to bad quality EEG or lost implant. Extended use of some animals could theoretically bring slight variation to results but could actually be an advantage as animals were already well trained providing low-motion data.

      "Before and after each habituation session, rats were given a treat of sugar water and/or chocolate cereals as positive reinforcement. " How much and what was the concentration of sugar water; chocolate cereal?

      Rats were given 3 chocolate cereals and/or 1% sugar water. This was added to the manuscript now.

      Line 188: "We relied on pilot calibration of the heated water to maintain the body temperature" Please explain.

      Sentence was clarified:

      “We relied on pilot calibration of the temperature of heated water circulating inside animal bed to maintain the normal body temperature of ~37 °C"

      Line 190: "After manual tuning and matching of the transmit-receive coil, shimming and anatomical imaging" Please explain for the non-specialist.

      Sentence was simplified:

      “After routine preparation steps in the MRI console were done"

      Lines 199-201: "Anatomical imaging was conducted with a T1-FLASH sequence (TR: 530 ms, TE: 4 ms, flip angle 196 18{degree sign}, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (TR: 0.971 ms, TE: 0 ms, flip angle 4{degree sign}, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3 , resolution of 0.5 x 0.5 x 1 mm3 , polar under sampling factor 5.64 nr. of projections 2060 resulting to a volume acquisition time of about 2 s). A total of 1350 volumes (45 min) were acquired." Please explain for the non-specialist.

      These technical parameters are provided for the sake of repeatability. Section was however clarified as the following and citation was added:

      Anatomical imaging was conducted with a T1-FLASH sequence (repetition time: 530 ms, echo time: 4 ms, flip angle 18°, bandwidth 39,682 kHz, matrix size 128 x 128, 51 slices, field-of-view 32 x 32 mm², spatial resolution 0.25 x 0.25 x 0.5 mm3). fMRI was performed with a 3D ZTE sequence (repetition time: 0.971 ms, TE: 0 ms, flip angle 4°, pulse length 1 µs, bandwidth 150 kHz, oversampling 4, matrix size 60 x 60 x 60, field-of-view 30 x 30 x 60 mm3, spatial resolution of 0.5 x 0.5 x 1 mm3, polar under sampling factor 5.64, number of projections 2060 resulting to a volume acquisition time of about 2 s (look Wiesinger & Ho, 2022 for parameter explanations)). A total of 1350 volumes (45 min) were acquired.

      "Visual (n=14 sessions, 5 rats) and somatosensory whisker (n=14 sessions, 4 rats)" - Please explain how multiple sessions were averaged for a single rat. Please justify the use of different numbers of sessions per rat.

      All the sessions belonging to the same stimulus scheme (multiple sessions per rat) were put at the once as sessions in SPM analysis together with all the stimulus conditions belonging to these sessions. Justifications for using a different number of sessions per rat, were given above.

      Lines 205-206: "For the visual stimulation, light pulses (3 Hz, 6 s total length, pulse length 166 ms) were produced by a blue led, and light was guided through two optical fibers to the front of the rat's eyes. What wavelength of blue? Why blue? Is the stimulation strong? Weak?

      Wavelength was 470 nm and brightness 7065 mcd with a current of 20mA. Blue was selected as it is in the frequency range that rat can differentiate and this color has been used in previous literature ( https://doi.org/10.1016/j.neuroimage.2020.117542, https://doi.org/10.1016/j.jneumeth.2021.109287)

      Line 212: "Stimulation parameters were based on previous rat stimulation fMRI studies to produce robust responses" What is a robust response? One where a lot of visual cortical voxels are activated?

      Sentence was corrected as the following:

      “Stimulation parameters were based on previous rat stimulation fMRI studies and chosen to activate voxels widely in visual and somatosensory pathways, correspondingly.”

      Line 245: "Seizures were confirmed as SWDs if they had a typical regular pattern, had at least double the amplitude compared to baseline signal..." What was the "typical" pattern? What baseline signal was it compared to? Was the baseline measured as an amplitude? Peak to trough?

      Sentence was corrected to the following:

      “Seizures were confirmed as SWDs if they had a typical regular spike and wave pattern with 7-12 Hz frequency range and had at least double the amplitude compared to baseline signal. All other signals were classified as baseline i.e. signal absent of a distinctive 7-12 Hz frequency power but spread within frequencies from 1 to 90 Hz.”

      "using rigid, affine, and SYN registrations" Please explain for the non-specialist.

      Corrected as the following:

      “using rigid, affine (linear) and SYN (non-linear) registrations”

      Line 274-5: "However, there were also intermediate cases where the seizure started or ended during the stimulation block (Figure 1 - Figure Supplement 1). These intermediate cases were modeled as confounds" Why confounds? They could be very interesting because the stimulation may not be affected if timed at the end of the seizure. What was the definition of start and end? Defining the onset and end of seizures is tricky.

      We agree that these cases are also highly interesting. Indeed, all the intermediate cases were also analyzed separately but not included in the manuscript (other than the case when stimulation immediately ended a seizure) as no statistical findings were found when comparing these cases to the baseline. E.g. for the case when stimulation was applied towards the end of seizure, it provided weakened responses but still stronger compared to case when stimulation was applied fully during a seizure (indicating some responsiveness after the cessation of seizure). As these intermediate cases led to results with higher variance, we considered them as confounds in the general linear model (i.e. reducing unwanted variance from the results of interests).

      Definition of onset and end of seizure can be difficult in some cases. When looking at the signal itself, especially towards the end of seizure the amplitude of SWDs can get weaker and thus the shift from seizure to baseline signal can be more problematic to differentiate. However, when looking at the power spectrum the boundaries were more easily detectable. Thus, in the definitions of onsets and ends of seizure we relied on both the signal and power spectrum (stated in the manuscript).

      "in the SPM analysis" Please explain for the non-specialist.

      Definition of SPM together with a link to software site was added.

      Line 276: "of fMRI data (see 2.5.3.) and thus explained variance that was not accounted for by the main effects of interest. " Please clarify.

      Clarified as:

      “Intermediate cases, where the seizure started or ended during the stimulation block (Figure 1–figure supplement 1), were considered as confounds of no-interest in the SPM analysis of fMRI data and the explained variance caused by the confounds were reduced from the main effects of interests”

      Line 277: "Additionally, a contrast..." What is meant?

      This chapter in 2.5.3. was modified as a whole to be more clear.

      Line 278-9: "...was given to two cases: i) when stimulation ended a seizure (0-2 s between stimulation start and seizure end)..." Again, how is the seizure onset and end defined?

      Look comment above.

      Lines 281-2: "Stimulations that did not fully coincide with a seizure were considered as nuisance regressors in the second level analysis." What is meant by nuisance regressor?

      Reference to SPM 12 manual was given for technical terms referring to analysis software.

      Lines 283-8: "Motion periods were also included as multiple regressors (not convolved with a basis function) to be used as nuisance regressors. Stimulations that coincided with a motion above 0.3% of the voxel size were not considered stimulation inputs. Stimulation and seizure inputs were convolved with "3 gamma distribution basis functions" (i.e. 3rd 285 order gamma) in SPM (option: basis functions, gamma functions, order: 3), to account for temporal and dispersion variations in the hemodynamic response. The choice of 3rd order gamma was based on the expectation that time-to peak and shape of HRFs of seizure could vary across voxels (David et al. 2008)." Please explain the technical terms.

      Reference for SPM 12 manual was given for technical terms referring to analysis software, and HRF was defined.

      "BAMS rat connectome" - Please explain the technical terms.

      Modified as:

      “…connection matrix of the rat nervous system (BAMS rat connectome, Bota, Dong, and Swanson 2012).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      Figure 2 - I don't understand "tSNR" here. What is the point here?

      B vs C. Are these different brain areas or the same but SNR was adjusted?

      D. Where is FD explained? I think explaining what the parts of the figure show would be helpful.

      tSNR, the temporal signal-to-noise ratio, demonstrates the behavior of noise through time. Readers who are planning to mimic the used awake fMRI protocol together with the single loop coil, might be interested on data quality aspect, and ability for the coil to capture signal from noise, as it is one of the most important factors in fMRI designs where small signal changes have to be distinguished from the background noise.

      B and C illustrate the same brain area, but B was acquired with high resolution anatomical scanning (T1 FLASH), and C was acquired with low resolution ZTE scanning. We clarified the figure legend to the following:

      “…spatial signal-to-noise ratios of an illustrative high resolution anatomical T1-FLASH (B), and low resolution ZTE image (C)

      FD was explained in section 2.5.1. Some parts of the explanation were clarified: “Framewise displacement (FD) (Figure 2E) was calculated as follows. First, the differential of successive motion parameters (x, y, z translation, roll, pitch, yaw rotation) was calculated. Then absolute value was taken from each parameter and rotational parameters were divided by 5 mm (as estimate of the rat brain radius) to convert degrees to millimeters (Power et al. 2012). Lastly, all the parameters were summed together.”

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      Line 384-5: "In addition, some responses were observed in the somatosensory cortex during a seizure state, probably due to incomplete nuisance removal of the effect of the seizure itself by the linear model used." I don't see why the authors would not suggest that the result is logical given that stimuli should activate the somatosensory cortex.

      Sentence was modified as the following:

      “In addition, responses were observed in the somatosensory cortex during a seizure state”

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      HRF- please define. The ROI selection is unclear - it "was based on statistical differences seen in activation maps." But how were ROIs drawn? Also, why were HRFs examined at the end of seizures?

      HRF was defined, and definitions of HRF and ROI were moved from results section 3.3. to method section 2.5.3.

      Definition of ROI was clarified:

      “Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps.”

      HRFs were estimated additionally at the end of seizure as it was specifically interesting to study brain state shifts from ictal to interictal. This shift was also providing us statistically significant findings in means that brain responses differed from ictal stimulation.

      Line 421: "Interestingly, the response amplitude was higher when the stimulation ended a seizure compared to when it did not" Why is this interesting?

      Word “interestingly” was changed to “additionally” to avoid any inferences in the results section.

      Line 427: "Notably, HRFs amplitudes were both negatively and positively signed during the ictal 427 state, depending on the brain region." Why is this notable?

      Word “notably” was removed to avoid any inferences in the results section.

      Please explain the legends of Figures 4 and 6 more clearly.

      Figure 4, and figure 4 – figure supplement 1, legends were clarified:

      “HRFs was calculated in selected ROI, belonging to visual or somatosensory area, by multiplying gamma basis functions (Figure 1–figure supplement 1, B) with their corresponding average beta values over a ROI and taking a sum of these values.”

      Using the comments above as a guide, please revise the Discussion to be more precise and more clear about what was shown and what can be concluded in light of limitations. Please ensure the literature is cited where appropriate.

      Some parts of the discussion and conclusion sections were modified.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Formatting: fMRI maps in Figures 3 and 5 should be more clearly labeled, indicating anterior and posterior directions on all images, and the cross sections should be enlarged to enable anatomical areas to be more clearly differentiated.

      Anterior and posterior directions were added, and cross sections were enlarged.

      The Methods section 2.41 and other places in the text, and Figure 2 - Figure Supplement 1 say that there was less artifact on the EEG with ZTA than with GE-EPI. However the EEG shown in Figure 2 - Figure Supplement 1 Part C shows much more artifact in the left (ZTE) trace than the right (GE-EPI) trace. This apparent contradiction should be resolved.

      The figure was actually demonstrating the relative change to the signal when MRI sequences were on, and by this standard, the ZTE produced both less amplitude and frequency changes than EPI. In the example figure, the baseline fluctuations in the EEG trace in the left were higher in amplitude than in the right, and this could potentially lead to misconception of ZTE producing more noise. Figure legend was clarified to highlight relative change:

      “ZTE also caused relatively less artificial noise on EEG signal, keeping both amplitude of the signal and frequencies relatively more intact, which improved live detection of absence seizures.”

      Figure 2 - Supplement 1, part B horizontal axis should provide units.

      Units were added.

      Figure 2 - Supplement 1, legend last sentence says arrows mark the beginning of each "sequence." Is this a typo and should this instead say "each seizure"?

      Should state “each fMRI sequence” which was corrected.

      Line 307, Methods "to reveal brain areas where ictal stimulation provided higher amplitude response than interictal" - should this be reversed, ie weren't the authors analyzing a contrast to determine where interictal signals were higher than ictal signals?

      This should be reversed, and was corrected, thank you for noting this.

      Figure 6 - Figure Supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest and perhaps should be subtracted out.

      We point out that Figure 6 - Figure Supplement 1 reproduces with a higher level of detail the results shown of Figure 6 from the main text, where all signals are plotted in the same scale. The difference between scales used in this figure is intended, and its purpose is to show and highlight the large differences observed on the ongoing activity and the evoked response between the two states (ictal and interictal). In interictal periods the ongoing activity is characterized by fluctuations around a baseline level whose variance is highly affected by the application of the stimulus. On the contrary, ictal periods are characterized by large oscillations, with periods of high and synchronized activity followed by periods of nearly no activity, where the effect of the stimulus on the dynamics is overshadowed by the ongoing dynamics (both from local and from afferent nodes) as the referee mentions, and which imposes a strong limit to the responsiveness of the system and the propagation of the signal.

    1. Author Response

      eLife assessment

      This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.

      Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We will show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the trigeminal system before. (iv) We will show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.

      Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in Author response image 1, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibres in what the referee thinks is the inferior olive.

      Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.

      Author response image 1.

      The putative inferior olive but not the putative trigeminal nucleus contains peripherin-positive axon bundles (presumptive climbing fibers). (A) Overview picture of a brainstem section stained with anti-peripherin-antibodies (white color). Anti-peripherin-antibodies stain climbing fibers in a wide variety of mammals. The section comes from the posterior brainstem of African elephant cow Bibi; in this posterior region, both putative inferior olive and trigeminal nucleus are visible. Note the bright staining of the dorsolateral nucleus, the putative inferior olive according to Reveyaz et al., and the trigeminal nucleus according to Maseko et al., 2013. (B) High magnification view of the dorsolateral nucleus (corresponding to the upper red rectangle in A). Anti-peripherin-positive axon bundles (putative climbing fibers) are seen in support of the inferior olive hypothesis of Reveyaz et al. (C) High magnification view of the ventromedial nucleus (corresponding to the lower red rectangle in A). The ventromedial nucleus is weakly positive for peripherin but contains no anti-peripherin-positive axon bundles (i.e. no putative climbing fibers) in support of the trigeminal nucleus hypothesis of Reveyaz et al. Note that myelin stripes – weakly visible as dark omissions – are clearly anti-peripherin-negative.

      Reviewer #1:

      Summary:

      This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist.

      Comment: We are incredibly grateful for this positive assessment.

      Changes: None.

      Strengths:

      • The authors made excellent use of the precious sample materials from 9 captive elephants.

      • The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".

      • Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers.

      Comment: The referee provides a concise summary of our findings.

      Changes: None.

      Weaknesses:

      • As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques).

      • The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species?

      Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 4 for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 6, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.

      Changes: We clarify the relation of myelin stripe and trunk fold patterns in our discussion of Figure 6.  

      Reviewer #2 (Public Review):

      The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.

      Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.

      Changes: None.

      The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported.

      Comment: We agree.

      Changes: None.

      The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.

      Comment & Change: We were not aware of the papers of Verhaart and included them in the revised ms.

      Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.

      Comment: We have three comments here:

      1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.

      2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 6A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.

      3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Author response image 1). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.

      Changes: None.

      Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2).

      Comment: We have two comments here:

      1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.

      2). In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Author response table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.

      Changes: Inclusion of Author response table 1, which we discuss in depth below.

      The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159/000113185). Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007/978-3-319-47829-6_988-1). The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".

      Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.

      Changes: We prepared a Author response image 2 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.

      Author response image 2.

      The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.

      But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem.

      (1) Intense cytochrome oxidase reactivity.

      (2) Large size of the putative trunk module.

      (3) Elongation of the putative trunk module.

      (4) The arrangement of these putative modules corresponds to elephant head anatomy.

      (5) Myelin stripes within the putative trunk module that apparently match trunk folds.

      (6) Location apparently matches other mammals.

      (7) Repetitive modular organization apparently similar to other mammals.

      (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.

      Comment: We agree those are key issues.

      Changes: None.

      Let's examine these justifications more closely.

      (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported.

      Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.

      Changes: 1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in general recommendation.

      2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.

      3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.

      Author response image 3.

      Cytochrome-oxidase staining overview along with the Maseko et al. (2013) scheme Left, coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C). Right, coronal partitioning scheme of Maseko et al. (2013) for the elephant brainstem at an approximately similar anterior-posterior level.

      The same structures can be recognized left and right. The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.

      At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.

      Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions.

      Comment: These are key points of our paper that the referee does not discuss.

      Changes: None.

      (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.

      Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.

      Changes: None.

      (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported.

      Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.

      (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.

      Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see Author response table 1).

      (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported.

      Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.

      Changes: None.

      Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.

      Comment: We disagree. To summarize:

      (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.

      (2)–(5)The referee does not really discuss our evidence on these points.

      (6) We were wrong and have now fixed this mistake.

      (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Author response image 4 4 and Author response table 1).

      (8) The referee bends the comparative evidence against us.

      Changes: None.

      A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.

      To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.

      We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).

      Author response image 4 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).

      Author response image 4.

      Overview of the brainstem organization in rodents & elephants according to Maseko et. (2013) and Reveyaz et al. (this paper).

      The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 4). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.

      Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 4) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Referee Figure 3). Our anti-peripherin staining indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.

      The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.

      The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (Figure 4) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted. A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.

      We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.

      Author response table 1.

      Qualitative evaluation of elephant brainstem partitioning schemes

      ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive We scored features that are clear and shared by all mammals – as far as we know them – as very attractive. We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive. Attractive features are either less clear or less well-shared features. Unattractive features are either less clear or less clearly not shared features.

      Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.

      What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors.

      Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.

      Changes: Ad hoc tract tracing see above.

      So what are these "bumps" in the elephant brainstem?

      Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?

      The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.

      Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.

      Changes: Please see our discussion above.

      Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals?

      Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.

      Changes: Please see our discussion above.

      What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature.

      Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (Referee Figure 1). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.

      Changes: Please see our discussion above.

      What do the authors actually have?

      The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.

      Comment: The referee reiterates their views.

      Changes: None.

      Reviewer #3 (Public Review):

      Summary:

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.

      Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.

      Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.

      Strengths:

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.

      Changes: None.

      Weaknesses:

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.

      Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

      Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.

      Changes: We hope to publish comparative data on elephant brain size and shape later this year.  

    1. Author Response

      eLife assessment

      This study presents a valuable method to visualize the location of the cell types discovered through single-cell RNA sequencing. The evidence supporting the claims is solid, but the inclusion of a larger number of samples would strengthen the study. It would also be helpful to have the methods explained in more detail. The work will be of interest to those seeking to identify new cell types from scRNA-seq and snRNA-seq data.

      Response: We are surprised about the editor’s assessment of our paper as a “valuable” method. This is the first Drosophila adult spatial transcriptomics paper. Hence, we would at least consider this being an “important” method. Spatial transcriptomics has thus far only been done in embryos, which are easy to process for FISH for many decades. Integration with single-cell data is also new. We are further surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an “important” biological finding of this paper. We are not aware that any localized mRNAs in Drosophila muscles were known prior to our study. This shows the advantage of spatial transcriptomics over single-cell techniques.

      The work indeed does not represent a full spatial fly adult atlas – however, a proof of principle study covering both the head and body that we consider at least “important”.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas.

      Response: We thank the reviewer for their comment, but are surprised that this assessment does not mention the identification of subcellular mRNA patterns in adult muscles as an important biological finding of this paper. This might be due to the visualization problem that this reviewer was facing with a greyscale version of the PDF as mentioned in the comments below. We do not know what caused the technical problem for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells.

      Response: The size of the fly is one of the most challenging aspects of performing spatial transcriptomics. The small size of the samples led to detachment from the slides, which we solved by coating the slides with gelatin. While the resolution of Molecular Cartography is high (<200nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with current techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or even higher resolution will be required.

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and data-rich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      1. The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISH-based spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      Response: We thank the reviewer for this comment but would like to add that the statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (Tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardial cells (26%)).

      Author response image 1.

      Density plots for body (left) and head (right) showing levels of gene expression detected in scRNA-seq (body: Fly Cell Atlas, Li et al. 2022, head: Pech et al. (2023)). Blue: all genes, red: genes used in the spatial study.

      1. Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      2. The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      3. Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      1. Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      Response: While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we will provide a more detailed description of the sections, including their sex/genotype/age. We would like to point out that we verified the specificity of our FISH method on all the body sections (Figure 2A, TpnC4 & Act88F) and not only on one. Furthermore, we also would like to state that the idea of “a rosetta stone” was mentioned as a future prospect. We will rewrite the discussion to make this more clear.

      1. Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      Response: As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used very low expressed genes like tinman (body) or sens (head). This shows that our method is more sensitive than single-cell data, as ALL cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method can’t resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight in this.

      1. Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful.

      Response: Any high-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, biorxiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for TpnC4 and Act88F (99.4 and 99.8%). We will add their analysis to our discussion.

      1. The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      Response: The total number of mRNAs detected per spatial transcriptomics experiment is much higher for the body samples compared to single-cell experiments (FCA data). In the head it is slightly lower, but here it is important to note that not all cell types are present in each slice in the head (while they are all present in the head scRNA experiments). A comparison on the cell-type level would be more meaningful, and we will investigate this for the revision.

      Author response image 2.

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      1. Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      Response: We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006) Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326: 287–299.)

      Author response image 3.

      Molecular Cartography zoomed in on indirect flight muscle. Segmented nuclei are shown in white (based on DAPI), scalebars represent 100 μm).

      b. The authors show interesting localization patterns in muscle tissue for different sarcomere protein-coding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes?

      Response: We thank the reviewer for their interest in the localization patterns in muscle tissue. We could confirm localized mRNA in all the sections analyzed, in flight muscles as well as in leg muscles. We furthermore show that Act 88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only). Hence, we already show the specificity test in a much more quantitative way compared to traditional FISH, which often includes amplification.

      1. The authors developed an unbiased method to identify "new cell types" which relies on co-expression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      Response: The term “new cell types” only appears in the title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we have shown where unannotated clusters from scRNA-seq map, based on gene expression. Therefore, we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims.

      Response: We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene co-expression and expression patterns. Although obtaining more sections would be valuable, we don’t believe it to be necessary for the current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would produce similar results as we already show. We are therefore reluctant to repeat the effort again.

      The usage of the term “new cell types” is indeed ambiguous and we will tone this down in the revised version. Instead, we meant that unannotated clusters could be mapped to their location. In the text, we further specify that this means that now we only have inferred the location of the nuclei and that for neurons their function/processes are still unknown. As such, our data provides a starting point to identify new cell types since their marker genes and nuclear location are inferred. The next step to identify “new cell types” would indeed be to acquire genetic access to the cell types and characterize them in more detail. This is currently beyond our goals, and therefore we will tone down the title in the revised version and thank the reviewer for this valuable suggestion.

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      Response: We believe that our manuscript as it stands now is already an “important” paper that will strongly impact the Drosophila community (and beyond the spatial transcriptomics community). As it stands, it provides the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Loh and colleagues investigate valence encoding in the mesolimbic dopamine system. Using an elegant approach, they show that sucrose, which normally evokes strong dopamine neuron activity and release in the nucleus accumbens, is made aversive via conditioned taste aversion, the same sucrose stimulus later evokes much less dopamine neuron activity and release. Thus, dopamine activity can dynamically track the changing valence of an unconditioned stimulus. These results are important for helping clarify valence and value related questions that are the matter of ongoing debate regarding dopamine functions in the field.

      Strengths:

      This is an elegant way to ask this question, the within subject's design and the continuity of the stimulus is a strong way to remove a lot of the common confounds that make it difficult to interpret valence-related questions. I think these are valuable studies that help tie up questions in the field while also setting up a number of interesting future directions. There are number of control experiments and tweaks to the design that help eliminate a number of competing hypotheses regarding the results. The data are clearly presented and contextualized.

      Weaknesses for consideration:

      The focus on one relatively understudied region of the rat striatum for dopamine recordings could potentially limit generalization of the findings. While this can be determined in future studies, the implications should be further discussed in the current manuscript.

      We agree that the manuscript would benefit from providing a stronger rationale for our recording sites and acknowledging the potential for regional differences in dopamine signaling. We have made the following additions to the manuscript:

      Added to the Discussion: “Recordings were targeted to the lateral VTA and the corresponding approximate terminal site in the NAc lateral shell (Lammel et al., 2008). Subregional differences in dopamine activity likely contribute to mixed findings on dopamine and affect. For example, dopamine in the NAc lateral shell differentially encodes cues predictive of rewarding sucrose and aversive footshock, which is distinct from NAc medial shell dopamine responses (de Jong et al., 2019). Our findings are similar to prior work from our group targeting recordings to the NAc dorsomedial shell (Hsu et al., 2020; McCutcheon et al., 2012; Roitman et al., 2008): there, intraoral sucrose increased NAc dopamine release while the response in the same rats to quinine was significantly lower.”

      Reviewer #2 (Public review):

      Summary:

      Koh et al. report an interesting manuscript studying dopamine binding in the lateral accumbens shell of rats across the course of conditioned taste aversion. The question being asked here is how does the dopamine system respond to aversion? The authors take advantage of unique properties of taste aversion learning (notably, within-subjects remapping of valence to the same physical stimulus) to address this.

      They combine a well controlled behavioural design (including key, unpaired controls) with fibre photometry of dopamine binding via GrabDA and of dopamine neuron activity by gCaMP, careful analyses of behaviour (e.g., head movements; home cage ingestion), the authors show that, 1) conditioned taste aversion of sucrose suppresses the activity of VTA dopamine neurons and lateral shell dopamine binding to subsequent presentations of the sucrose tastant; 2) this pattern of activity was similar to the innately aversive tastant quinine; 3) dopamine responses were negatively correlated with behavioural (inferred taste reactivity) reactivity; and 4) dopamine responses tracked the contingency of between sucrose and illness because these responses recovered across extinction of the conditioned taste aversion.

      Strengths:

      There are important strengths here. The use of a well-controlled design, the measurement of both dopamine binding and VTA dopamine neuron activity, the inclusion of an extinction manipulation; and the thorough reporting of the data. I was not especially surprised by these results, but these data are a potentially important piece of the dopamine puzzle (e.g., as the authors note, salience-based argument struggles to explain these data).

      Weaknesses for consideration:

      (1) The focus here is on the lateral shell. This is a poorly investigated region in the context of the questions being asked here. Indeed, I suspect many readers might expect a focus on the medial shell. So, I think this focus is important. But, I think it does warrant greater attention in both the introduction and discussion. We do know from past work that there can be extensive compartmentalisation of dopamine responses to appetitive and aversive events and many of the inconsistent findings in the literature can be reconciled by careful examination of where dopamine is assessed. I do think readers would benefit from acknowledgement this - for example it is entirely reasonable to suppose that the findings here may be specific to the lateral shell.

      As with our response to Reviewer 1, we agree that we should provide further rationale for focusing our recordings on the lateral shell and acknowledge potential differences in dopamine dynamics across NAc subregions. In addition to the changes in the Discussion detailed in our response to Reviewer 1, we have made the following additions to the Introduction:

      Added to the Introduction: “NAc lateral shell dopamine differentially encodes cues predictive of rewarding (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock), which is distinct from other subregions (de Jong et al., 2019). It is important to note that other regions of the NAc may serve as hedonic hotspots (e.g. dorsomedial shell; or may more closely align with the signaling of salience (e.g. ventromedial shell; (Yuan et al., 2021)).”

      (2) Relatedly, I think readers would benefit from an explicit rationale for studying the lateral shell as well as consideration of this in the discussion. We know that there are anatomical (PMID: 17574681), functional (PMID: 10357457), and cellular (PMID: 7906426) differences between the lateral shell and the rest of the ventral striatum. Critically, we know that profiles of dopamine binding during ingestive behaviours there can be highly dissimilar to the rest of ventral striatum (PMID: 32669355). I do think these points are worth considering.

      There are several reasons why dopamine dynamics were recorded in the NAc lateral shell:

      (1) Dopamine neurons in more medial aspects of the VTA preferentially target the NAc medial shell and core whereas dopamine neurons in the lateral VTA – our target for VTA DA recordings – project to the lateral shell of the NAc (Lammel et al., 2008). Thus, our goal was to sample NAc release dynamics in areas that receive projections from our cell body recording sites.

      (2) Cues predictive of reward availability (i.e., sipper spout with sucrose) and aversive stimuli (i.e., footshock) are differentially encoded by NAc lateral shell dopamine, which is distinct from NAc ventromedial shell dopamine responses (de Jong et al., 2019). These findings suggest a role for NAc lateral shell dopamine in the encoding of a stimulus’s valence, which made the subregion an area of interest for further examination.

      (3) With respect to the medial NAc shell specifically, extensive literature had already shown it to be a ‘hedonic hotspot’ (Morales and Berridge, 2020; Yuan et al., 2021) whereas the ventral portion is more mixed with respect to valence (Yuan et al., 2021). We had previously shown that intraoral infusions of primary taste stimuli of opposing valence (i.e., sucrose and quinine) evoke differential responses in dopamine release within the NAc dorsomedial shell (Roitman et al., 2008). We more recently replicated differential dopamine responses from dopamine cell bodies in the lateral VTA (Hsu et al., 2020) and thus endeavored to the possibility of changing dopamine responses in the lateral VTA to the same stimulus as its valence changes. As a result of these choices, measuring dopamine release in the lateral shell was a logical choice. The field would greatly benefit from continued future work surveying the entirety of the VTA DA projection terminus. 

      We have included these points of justification in the Introduction and Discussion sections.

      (3) I found the data to be very thoughtfully analysed. But in places I was somewhat unsure:

      (a) Please indicate clearly in the text when photometry data show averages across trials versus when they show averages across animals.

      We have now explicitly indicated in the figure legends of Figures 1, 3, 5, 7, and 8:

      (1) In heat maps, each row represents the averaged (across rats) response on that trial.

      (2) Traces below heat maps represent the response to infusion averaged first across trials for each rat and then across all rats.

      (3) Insets represent the average z-score across the infusion period averaged first across all trials for each rat and then across all rats.

      (b) I did struggle with the correlation analyses, for two reasons.

      (i) First, the key finding here is that the dopamine response to intraoral sucrose is suppressed by taste aversion. So, this will significantly restrict the range of dopamine transients, making interpretation of the correlations difficult.

      The overall hypothesis is that the dopamine response would correlate with the valence of a taste stimulus – even and especially when the stimulus remained constant but its valence changed. We inferred valence from the behavioral reactivity to the stimulus – reasoning that an appetitive taste will evoke minimal movement of the nose and paws (presumably because the animals are primarily engaging in small mouth movements associated with ingestion as shown by the seminal work of Grill and Norgren (1978) and the many studies published by the K.C. Berridge group) whereas an aversive taste will evoke significantly more movement as the rats engage in rejection responses (e.g. forelimb flails, chin rubs, etc.). When we conducted our regression analyses we endeavored to be as transparent as possible and labeled each symbol based on group (Unpaired vs Paired) and day (Conditioning vs Test). Both behavioral reactivity and dopamine responses change – but only for the Paired rats across days. In this sense, we believe the interpretation is clear. However, the Reviewer raises an important criticism that there would essentially be a floor effect with dopamine responses. We believe this is mitigated by data acquired across extinction and especially in Figure 9B. Here, the observations that dopamine responses fall to near zero but return to pre-conditioning levels in the Paired group with strong correlation between dopamine and behavioral reactivity throughout would hopefully partially allay the Reviewer’s concerns. See Part ii below for further support.

      (ii) Second, the authors report correlations by combining data across groups/conditions. I understand why the authors have done this, but it does risk obscuring differences between the groups. So, my question is: what happens to this trend when the correlations are computed separately for each group? I suspect other readers will share the same question. I think reporting these separate correlations would be very helpful for the field -

      regardless of the outcome.

      To address this concern, we performed separate regression analyses for Paired and Unpaired rats and provide the table below to detail results where data were combined across groups or separated. Expectedly, all analyses in Paired rats indicated a significant inverse relationship between dopamine and behavioral reactivity. Afterall, it is only in this group where behavioral reactivity to the taste stimulus changes as function of conditioning. Perhaps even more striking is that in almost all comparisons, even when restricting the regression analysis to Unpaired rats, we still observed a significant inverse relationship between dopamine and behavioral reactivity in most experiments. We have outlined the separated correlations below (asterisks denote slopes significantly different from 0; * p<0.05; ** p<0.01; *** p<0.005; **** p<0.001):

      Author response table 1.

      (4) Figure 1A is not as helpful as it might be. I do think readers would expect a more precise reporting of GCaMP expression in TH+ and TH- neurons. I also note that many of the nuances in terms of compartmentalisation of dopamine signalling discussed above apply to ventral tegmental area dopamine neurons (e.g. medial v lateral) and this is worth acknowledging when interpreting t

      Others have reported (Choi et al., 2020) and quantified (Hsu et al., 2020) GCaMP6f expression in TH+ neurons. While we didn’t report these quantifications, our observations were very much in line with previous quantifications from our laboratory (Hsu et al. 2020).

      We agree that we should elaborate on VTA subregional differences and have answered this response above (See responses to Reviewer 1 Weakness #1 and Reviewer 2 Weakness #2).

      Reviewer #3 (Public review):

      Summary:

      This study helps to clarify the mixed literature on dopamine responses to aversive stimuli. While it is well accepted that dopamine in the ventral striatum increases in response to various rewarding and appetitive stimuli, aversive stimuli have been shown to evoke phasic increases or decreasing depending on the exact aversive stimuli, behavioral paradigm, and/or dopamine recording method and location examined. Here the authors use a well-designed set of experiments to show differential responses to an appetitive primary reward (sucrose) that later becomes a conditioned aversive stimulus (sucrose previously paired with lithium chloride in a conditioned taste aversion paradigm). The results are interesting and add valuable data to the question of how the mesolimbic dopamine system encodes aversive stimuli, however, the conclusions are strongly stated given that the current data do not necessarily align with prior conflicting data in terms of recording location, and it is not clear exactly how to interpret the generally biphasic dopamine response to the CTA-sucrose which also evolves over exposures within a single session.

      Strengths:

      • The authors nicely demonstrate that their two aversive stimuli examined, quinine and sucrose following CTA, evoked aversive facial expressions and paw movements that differed from those following rewarding sucrose to support that the stimuli experienced by the rats differ in valence.

      • Examined dopamine responses to the exact same sensory stimuli conditioned to have opposing valences, avoiding standard confounds of appetitive and aversive stimuli being sensed by different sensory modalities (i.e., sweet taste vs. electric shock)

      • The authors examined multiple measurements of dopamine activity - cell body calcium (GCaMP6f) in midbrain and release in NAc (Grab-DA2h), which is useful as the prior mixed literature on aversive dopamine responses comes from a variety of recording methods.

      • Correlations between sucrose preference and dopamine signals demonstrate behavioral relevance of the differential dopamine signals.

      • The delayed testing experiment in Figure 7 nicely controls for the effect of time to demonstrate that the "rewarding" dopamine response to sucrose only recovers after multiple extinction sucrose exposures to extinguish the CTA.

      Weaknesses for consideration:

      (1) Regional differences in dopamine signaling to aversive stimuli are mentioned in the introduction and discussion. For instance, the idea that dopamine encodes salience is strongly argued against in the discussion, but the paper cited as arguing for that (Kutlu et al. 2021) is recording from the medial core in mice. Given other papers cited in the text about the regional differences in dopamine signaling in the NAc and from different populations of dopamine neurons in midbrain, it's important to mention this distinction wrt to salience signaling. Relatedly, the text says that the lateral NAc shell was targeted for accumbens recordings, but the histology figure looks like the majority of fibers were in the anterior lateral core of NAc. For the current paper to be a convincing last word on the issue, it would be extremely helpful to have similar recordings done in other parts of the NAc to do a more thorough comparison against other studies.

      As the Reviewer notes, NAc dopamine recordings were aimed at the lateral NAc shell. It is possible that some dopamine neurons lying within the anterior lateral core were recorded. Fiber photometry and the size of the fiber optics cannot definitively identify the precise location and number of dopamine neurons from which we recorded. Still, recording sites did not systematically differ between groups. Further, the within-subjects design helps to mitigate any potential biases for one subregion over another. The results presented in the manuscript strongly support a valence code. It is difficult to be the ‘last word’ on this topic and we suspect debate will continue. We used taste stimuli for appetitive and aversive stimuli – whereas many in the field will continue to use other noxious stimuli (e.g. foot shock) that likely recruit different circuits en route to the VTA. And there may very well be a different regional profile for dopamine signaling with different noxious stimuli. Moreover, we used intraoral infusion to avoid confounds of stimulus avoidance and competing motivations (e.g. food or fluid deprivation). We believe that this is one of the most important and unique features of our report. Recent work supports a role for phasic increases in dopamine in avoidance of noxious stimuli (Jung et al., 2024) and it will be critical for the field to reflect on the differences between avoidance and aversion. Moreover, in ongoing studies we aspire to fully survey dopamine signaling in conditioned taste aversion across the medial-lateral and dorsal-ventral axes of the VTA and NAc.

      (2) Dopamine release in the NAc never dips below baseline for the conditioned sucrose. Is it possible to really consider this as a signal for valence per se, as opposed to it being a weaker response relative to the original sucrose response?

      Indeed, NAc dopamine release to intraoral quinine nor aversive sucrose doesn’t dip below baseline but rather dopamine binding doesn’t change from pre-infusion baseline levels. It should be noted that VTA dopamine cell body activity does indeed dip below baseline in response to aversive sucrose. Moreover, using fast-scan cyclic voltammetry, we showed that dopamine release dips below baseline in the NAc dorsomedial shell in response to intraoral quinine (Roitman et al., 2008). The differences across recording sites may reflect regional differences but they may also reflect differences in recording approaches. GrabDA2h, used here, has relatively slow kinetics that may obscure dips below baseline (see response Weakness# 8 below).

      (3) Related to this, the main measure of the dopamine signal here, "mean z-score," obscures the temporal dynamics of the aversive dopamine response across a trial. This measure is used to claim that sucrose after CTA is "suppressing" dopamine neuron activity and release, which is true relative to the positive valence sucrose response. However, both GRAB-DA and cell-body GCaMP measurements show clear increases after onset of sucrose infusion before dipping back to baseline or slightly below in the average of all example experiments displayed. One could point to these data to argue either that aversive stimuli cause phasic increases in dopamine (due to the initial increase) or decreases (due to the delayed dip below baseline) depending on the measurement window. Some discussion of the dynamics of the response and how it relates to the prior literature would be useful.

      We have used mean z-score to do much of our quantitative analyses but the Reviewer raises the intriguing possibility that we are masking an initial increase in dopamine release and VTA DA activity evoked by aversive taste by doing so. We included the heat maps in the manuscript to be as transparent as possible about the time course of dopamine responses – both within a trial and across trials. The Reviewer’s point prompted us to reflect further on the heat maps and recognize that trials early in the session often showed a brief increase in dopamine for aversive sucrose but this response dissipated (NAc dopamine release) or flipped (VTA DA cell body activity) over trials. We now quantitatively characterize this feature by looking at the timecourse of dopamine responses in each third of the trials (1-10, 11-20, 21-30; see Author response images 1,2 and 3). As we infer the valence of the stimulus from nose and paw movements (behavioral reactivity), it is especially striking that we a similar timecourse for changes in behavior. Collectively, the data may reflect an updating process that is relatively slow and requires experience of the stimulus in a new (aversive) state – that is, a model-free process. While our experiments were not designed to test the updating of dopamine responses and discern their participation in model-based versus model-free learning processes – another debate in the dopamine field (Cone et al., 2016; Deserno et al., 2021)– the data reflect a model-free process. This is further supported in the experiment involving multiple conditioning sessions, where dopamine ‘dips’ are observed in trials 1-10 on Conditioning Day 3 and Extinction Day 1 when the new value of sucrose has been established. Finally, the relatively slow updating of the value of sucrose is reflected in older literature using a continuous intraoral infusion. Using this approach, rats began rejecting the saccharin infusion only after ~2min rather than immediately (Schafe et al., 1998; Schafe and Bernstein, 1996; Wilkins and Bernstein, 2006).   

      Author response image 1.

      Author response image 2.

      Author response image 3.

      (4) Would this delayed below-baseline dip be visible with a shorter infusion time?

      While our experiments did not explore this parameter, it would be interesting to parametrically vary infusion duration times and examine differences in dopamine responses. However, we believe the most parsimonious explanation is that the ‘dip’ in VTA cell body activity develops as a function of the slow updating of the value of sucrose reflective of a model-free process. We recognize that this is mere speculation.

      (5) Does the max of the increase or the dip of the decrease better correlate with the behavioral measures of aversion (orofacial, paw movements) or sucrose preference than "mean z-score" measure used here?

      It seems plausible that finding the most extreme value from baseline could better correlate to behavioral measures. Time courses to max increase and max decrease are different. Moreover, with appetitive sucrose, there are often multiple transients that occur throughout a single intraoral infusion. Coupled with a noisy time course for individual components of behavioral reactivity, we determined that averaging data across the whole infusion period (i.e. mean z-score) was the most objective way we could analyze the dopamine and behavioral responses to taste stimuli.

      (6) The authors argue strongly in the discussion against the idea that dopamine is encoding "salience." Could this initial peak (also seen in the first few trials of quinine delivery, fig 1c color plot) be a "salience" response?

      Our response above to the potential for ‘mixed’ dopamine responses to aversive sucrose led to additional analyses that support a slow updating of both behavior and dopamine to the new, aversive value of sucrose. Quinine is innately aversive and thus the Reviewer rightly points out that even here we observe an increase in dopamine release evoked by quinine on the first few trials (as observed in the heat map). We’d like to note, though, that the order of stimulus exposure was counterbalanced across rats. In those rats first receiving a sucrose session, quinine initially caused a modest increase in dopamine release during the first 10 trials (which is more pronounced in the first 2 trials). In the subsequent 2 blocks of 10 trials, no such increase was observed. Interestingly, in rats for which quinine was their first stimulus, we did not see an increase in dopamine release on the first few trials (see Author response image 4). We speculate that the initial sucrose session required the value of intraoral infusions to be updated when quinine was delivered to these rats and that, once more, the updating process may be slow and akin to a model-free process. This analysis, at present, is underpowered but will direct future attention in follow-up work.

      Author response image 4.

      (7) Related to this, the color plots showing individual trials show a reduction in the increases to positive valence sucrose across conditioning day trials and a flip from infusion-onset increase to delayed increases across test day trials. This evolution across days makes it appear that the last few conditioning day trials would be impossible to discriminate from the first few test day trials in the CTA-paired. Presumably, from strength of CTA as a paradigm, the sucrose is already aversive to the animals at the first trial of test day. Why do the authors think the response evolves across this session?

      As the Reviewer noted, Points 3-7 are related. We have speculated that the evolving dopamine response in Paired rats across test day trials reflects a model-free process. Importantly, as in the manuscript, our additional analyses once again show a tight relationship between behavioral reactivity and the dopamine response across the test session trials. It is important to note, though, that these experiments were not designed to test if responses reflect model-free or model-based processes.

      (8) Given that most of the work is using a conditioned aversive stimulus, the comparison to a primary aversive tastant quinine is useful. However, the authors saw basically no dopamine response to a primary aversive tastant quinine (measured only with GRAB-DA) and saw less noticeable decreases following CTA for NAc recordings with GRAB-DA2h than with cell body GCaMP. Given that they are using the high-affinity version of the GRAB sensor, this calls into question whether this is a true difference in release vs. soma activity or issue of high affinity release sensor making decreases in dopamine levels more difficult to observe.

      We share the same speculation as the Reviewer. Using fast-scan cyclic voltammetry, albeit measuring dopamine concentration in the dorsomedial shell, we observed a clear decrease from baseline with intraoral infusions of quinine (Roitman et al., 2008). Using fiber photometry here, the Reviewer and we note that GRAB_DA2h is a high-affinity (i.e., EC50: 7nM) dopamine sensor with relatively long off-kinetics (i.e., t1/2 decay time: 7300ms) (Labouesse et al., 2020). It may therefore be much more difficult to observe decreases (below baseline) using this sensor. The publication of new dopamine sensors - with lower affinity, faster kinetics, and greater dynamic range (Zhuo et al., 2024) – introduces opportunities for comparison and the greater potential for capturing decreases below baseline. Due to the poorer kinetics associated with GRAB_DA2h, we would not assert that direct comparisons between the GCaMP- and GRAB-based signals observed here represent true differences between somatic and terminal activity.

      References

      Choi JY, Jang HJ, Ornelas S, Fleming WT, Fürth D, Au J, Bandi A, Engel EA, Witten IB. 2020. A Comparison of Dopaminergic and Cholinergic Populations Reveals Unique Contributions of VTA Dopamine Neurons to Short-Term Memory. Cell Rep 33. doi:10.1016/j.celrep.2020.108492

      Cone JJ, Fortin SM, McHenry JA, Stuber GD, McCutcheon JE, Roitman MF. 2016. Physiological state gates acquisition and expression of mesolimbic reward prediction signals. Proc Natl Acad Sci U S A 113. doi:10.1073/pnas.1519643113

      de Jong JW, Afjei SA, Pollak Dorocic I, Peck JR, Liu C, Kim CK, Tian L, Deisseroth K, Lammel S. 2019. A Neural Circuit Mechanism for Encoding Aversive Stimuli in the Mesolimbic Dopamine System. Neuron 101. doi:10.1016/j.neuron.2018.11.005

      Deserno L, Moran R, Michely J, Lee Y, Dayan P, Dolan RJ. 2021. Dopamine enhances model-free credit assignment through boosting of retrospective model-based inference. Elife 10. doi:10.7554/eLife.67778

      Hsu TM, Bazzino P, Hurh SJ, Konanur VR, Roitman JD, Roitman MF. 2020. Thirst recruits phasic dopamine signaling through subfornical organ neurons. Proc Natl Acad Sci U S A 117:30744–30754. doi:10.1073/PNAS.2009233117/-/DCSUPPLEMENTAL

      Jung K, Krüssel S, Yoo S, An M, Burke B, Schappaugh N, Choi Y, Gu Z, Blackshaw S, Costa RM, Kwon HB. 2024. Dopamine-mediated formation of a memory module in the nucleus accumbens for goal-directed navigation. Nat Neurosci. doi:10.1038/s41593-024-01770-9

      Labouesse MA, Cola RB, Patriarchi T. 2020. GPCR-based dopamine sensors—A detailed guide to inform sensor choice for in vivo imaging. Int J Mol Sci. doi:10.3390/ijms21218048

      Lammel S, Hetzel A, Häckel O, Jones I, Liss B, Roeper J. 2008. Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine System. Neuron 57. doi:10.1016/j.neuron.2008.01.022

      McCutcheon JE, Ebner SR, Loriaux AL, Roitman MF, Tobler PN. 2012. Encoding of aversion by dopamine and the nucleus accumbens. Front Neurosci 6. doi:10.3389/fnins.2012.00137

      Morales I, Berridge KC. 2020. ‘Liking’ and ‘wanting’ in eating and food reward: Brain mechanisms and clinical implications. Physiol Behav. doi:10.1016/j.physbeh.2020.113152

      Roitman MF, Wheeler RA, Wightman RM, Carelli RM. 2008. Real-time chemical responses in the nucleus accumbens differentiate rewarding and aversive stimuli. Nature Neuroscience 2008 11:12 11:1376–1377. doi:10.1038/nn.2219

      Schafe GE, Bernstein IL. 1996. Forebrain contribution to the induction of a brainstem correlate of conditioned taste aversion: I. The amygdala. Brain Res 741. doi:10.1016/S0006-8993(96)00906-7

      Schafe GE, Thiele TE, Bernstein IL. 1998. Conditioning method dramatically alters the role of amygdala in taste aversion learning. Learning and Memory 5. doi:10.1101/lm.5.6.481

      Wilkins EE, Bernstein IL. 2006. Conditioning method determines patterns of c-fos expression following novel taste-illness pairing. Behavioural Brain Research 169. doi:10.1016/j.bbr.2005.12.006

      Yuan L, Dou YN, Sun YG. 2021. Topography of reward and aversion encoding in the mesolimbic dopaminergic system. Journal of Neuroscience 39. doi:10.1523/JNEUROSCI.0271-19.2019

      Zhuo Y, Luo B, Yi X, Dong H, Miao X, Wan J, Williams JT, Campbell MG, Cai R, Qian T, Li F, Weber SJ, Wang L, Li B, Wei Y, Li G, Wang H, Zheng Y, Zhao Y, Wolf ME, Zhu Y, Watabe-Uchida M, Li Y. 2024. Improved green and red GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 21. doi:10.1038/s41592-023-02100-w

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. Author response:

      Reviewer #1 (Public review):

      Wang et al., recorded concurrent EEG-fMRI in 107 participants during nocturnal NREM sleep to investigate brain activity and connectivity related to slow oscillations (SO), sleep spindles, and in particular their co-occurrence. The authors found SO-spindle coupling to be correlated with increased thalamic and hippocampal activity, and with increased functional connectivity from the hippocampus to the thalamus and from the thalamus to the neocortex, especially the medial prefrontal cortex (mPFC). They concluded the brain-wide activation pattern to resemble episodic memory processing, but to be dissociated from task-related processing and suggest that the thalamus plays a crucial role in coordinating the hippocampal-cortical dialogue during sleep.

      The paper offers an impressively large and highly valuable dataset that provides the opportunity for gaining important new insights into the network substrate involved in SOs, spindles, and their coupling. However, the paper does unfortunately not exploit the full potential of this dataset with the analyses currently provided, and the interpretation of the results is often not backed up by the results presented. I have the following specific comments.

      Thank you for your thoughtful and constructive feedback. We greatly appreciate your recognition of the strengths of our dataset and findings Below, we address your specific comments and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We hope these revisions address your comments and further strengthen our manuscript. Thank you again for the constructive feedback.

      (1) The introduction is lacking sufficient review of the already existing literature on EEG-fMRI during sleep and the BOLD-correlates of slow oscillations and spindles in particular (Laufs et al., 2007; Schabus et al., 2007; Horovitz et al., 2008; Laufs, 2008; Czisch et al., 2009; Picchioni et al., 2010; Spoormaker et al., 2010; Caporro et al., 2011; Bergmann et al., 2012; Hale et al., 2016; Fogel et al., 2017; Moehlman et al., 2018; Ilhan-Bayrakci et al., 2022). The few studies mentioned are not discussed in terms of the methods used or insights gained.

      We acknowledge the need for a more comprehensive review of prior EEG-fMRI studies investigating BOLD correlates of slow oscillations and spindles. However, these articles are not all related to sleep SO or spindle. Articles (Hale et al., 2016; Horovitz et al., 2008; Laufs, 2008; Laufs, Walker, & Lund, 2007; Spoormaker et al., 2010) mainly focus on methodology for EEG-fMRI, sleep stages, or brain networks, which are not the focus of our study. Thank you again for your attention to the comprehensiveness of our literature review, and we will expand the introduction to include a more detailed discussion of the existing literature, ensuring that the contributions of previous EEG-fMRI sleep studies are adequately acknowledged.

      Introduction, Page 4 Lines 62-76

      “Investigating these sleep-related neural processes in humans is challenging because it requires tracking transient sleep rhythms while simultaneously assessing their widespread brain activation. Recent advances in simultaneous EEG-fMRI techniques provide a unique opportunity to explore these processes. EEG allows for precise event-based detection of neural signal, while fMRI provides insight into the broader spatial patterns of brain activation and functional connectivity (Horovitz et al., 2008; Huang et al., 2024; Laufs, 2008; Laufs, Walker, & Lund, 2007; Schabus et al., 2007; Spoormaker et al., 2010). Previous EEG-fMRI studies on sleep have focused on classifying sleep stages or examining the neural correlates of specific waves (Bergmann et al., 2012; Caporro et al., 2012; Czisch et al., 2009; Fogel et al., 2017; Hale et al., 2016; Ilhan-Bayrakcı et al., 2022; Moehlman et al., 2019; Picchioni et al., 2011). These studies have generally reported that slow oscillations are associated with widespread cortical and subcortical BOLD changes, whereas spindles elicit activation in the thalamus, as well as in several cortical and paralimbic regions. Although these findings provide valuable insights into the BOLD correlates of sleep rhythms, they often do not employ sophisticated temporal modeling (Huang et al., 2024), to capture the dynamic interactions between different oscillatory events, e.g., the coupling between SOs and spindles.”

      (2) The paper falls short in discussing the specific insights gained into the neurobiological substrate of the investigated slow oscillations, spindles, and their interactions. The validity of the inverse inference approach ("Open ended cognitive state decoding"), assuming certain cognitive functions to be related to these oscillations because of the brain regions/networks activated in temporal association with these events, is debatable at best. It is also unclear why eventually only episodic memory processing-like brain-wide activation is discussed further, despite the activity of 16 of 50 feature terms from the NeuroSynth v3 dataset were significant (episodic memory, declarative memory, working memory, task representation, language, learning, faces, visuospatial processing, category recognition, cognitive control, reading, cued attention, inhibition, and action).

      Thank you for pointing this out, particularly regarding the use of inverse inference approaches such as “open-ended cognitive state decoding.” Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7. We will refocus the main text on direct neurobiological insights gained from our EEG-fMRI analyses, particularly emphasizing the hippocampal-thalamocortical network dynamics underlying SO-spindle coupling, and we will acknowledge the exploratory nature of these findings and highlight their limitations.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      (3) Hippocampal activation during SO-spindles is stated as a main hypothesis of the paper - for good reasons - however, other regions (e.g., several cortical as well as thalamic) would be equally expected given the known origin of both oscillations and the existing sleep-EEG-fMRI literature. However, this focus on the hippocampus contrasts with the focus on investigating the key role of the thalamus instead in the Results section.

      We appreciate your insight regarding the relative emphasis on hippocampal and thalamic activation in our study. We recognize that the manuscript may currently present an inconsistency between our initial hypothesis and the main focus of the results. To address this concern, we will ensure that our Introduction and Discussion section explicitly discusses both regions, highlighting the complementary roles of the hippocampus (memory processing and reactivation) and the thalamus (spindle generation and cortico-hippocampal coordination) in SO-spindle dynamics.

      Introduction, Page 5 Lines 87-103

      “To address this gap, our study investigates brain-wide activation and functional connectivity patterns associated with SO-spindle coupling, and employs a cognitive state decoding approach (Margulies et al., 2016; Yarkoni et al., 2011)—albeit indirectly—to infer potential cognitive functions. In the current study, we used simultaneous EEG-fMRI recordings during nocturnal naps (detailed sleep staging results are provided in the Methods and Table S1) in 107 participants. Although directly detecting hippocampal ripples using scalp EEG or fMRI is challenging, we expected that hippocampal activation in fMRI would coincide with SO-spindle coupling detected by EEG, given that SOs, spindles, and ripples frequently co-occur during NREM sleep. We also anticipated a critical role of the thalamus, particularly thalamic spindles, in coordinating hippocampal-cortical communication.

      We found significant coupling between SOs and spindles during NREM sleep (N2/3), with spindle peaks occurring slightly before the SO peak. This coupling was associated with increased activation in both the thalamus and hippocampus, with functional connectivity patterns suggesting thalamic coordination of hippocampal-cortical communication. These findings highlight the key role of the thalamus in coordinating hippocampal-cortical interactions during human sleep and provide new insights into the neural mechanisms underlying sleep-dependent brain communication. A deeper understanding of these mechanisms may contribute to future neuromodulation approaches aimed at enhancing sleep-dependent cognitive function and treating sleep-related disorders.”

      Discussion, Page 16-17 Lines 292-307

      “When modeling the timing of these sleep rhythms in the fMRI, we observed hippocampal activation selectively during SO-spindle events. This suggests the possibility of triple coupling (SOs–spindles–ripples), even though our scalp EEG was not sufficiently sensitive to detect hippocampal ripples—key markers of memory replay (Buzsáki, 2015). Recent iEEG evidence indicates that ripples often co-occur with both spindles (Ngo, Fell, & Staresina, 2020) and SOs (Staresina et al., 2015; Staresina et al., 2023). Therefore, the hippocampal involvement during SO-spindle events in our study may reflect memory replay from the hippocampus, propagated via thalamic spindles to distributed cortical regions.

      The thalamus, known to generate spindles (Halassa et al., 2011), plays a key role in producing and coordinating sleep rhythms (Coulon, Budde, & Pape, 2012; Crunelli et al., 2018), while the hippocampus is found essential for memory consolidation (Buzsáki, 2015; Diba & Buzsá ki, 2007; Singh, Norman, & Schapiro, 2022). The increased hippocampal and thalamic activity, along with strengthened connectivity between these regions and the mPFC during SO-spindle events, underscores a hippocampal-thalamic-neocortical information flow. This aligns with recent findings suggesting the thalamus orchestrates neocortical oscillations during sleep (Schreiner et al., 2022). The thalamus and hippocampus thus appear central to memory consolidation during sleep, guiding information transfer to the neocortex, e.g., mPFC.”

      (4) The study included an impressive number of 107 subjects. It is surprising though that only 31 subjects had to be excluded under these difficult recording conditions, especially since no adaptation night was performed. Since only subjects were excluded who slept less than 10 min (or had excessive head movements) there are likely several datasets included with comparably short durations and only a small number of SOs and spindles and even less combined SO-spindle events. A comprehensive table should be provided (supplement) including for each subject (included and excluded) the duration of included NREM sleep, number of SOs, spindles, and SO+spindle events. Also, some descriptive statistics (mean/SD/range) would be helpful.

      We appreciate your recognition of our sample size and the challenges associated with simultaneous EEG-fMRI sleep recordings. We acknowledge the importance of transparently reporting individual subject data, particularly regarding sleep duration and the number of detected SOs, spindles, and SO-spindle events. To address this, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (5)Density of detected SOs; (6)Density of detected spindles; (7)Density of detected SO-spindle coupling events.

      However, most of the excluded participants were unable to fall asleep or had too short a sleep duration, so they basically had no NREM sleep period, so it was impossible to count the NREM sleep duration, SO, spindle, and coupling numbers.

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (5) Was the 20-channel head coil dedicated for EEG-fMRI measurements? How were the electrode cables guided through/out of the head coil? Usually, the 64-channel head coil is used for EEG-fMRI measurements in a Siemens PRISMA 3T scanner, which has a cable duct at the back that allows to guide the cables straight out of the head coil (to minimize MR-related artifacts). The choice for the 20-channel head coil should be motivated. Photos of the recording setup would also be helpful.

      Thank you for your comment regarding our choice of the 20-channel head coil for EEG-fMRI measurements. We acknowledge that the 64-channel head coil is commonly used in Siemens PRISMA 3T scanners; however, the 20-channel coil was selected due to specific practical and technical considerations in our study. In particular, the 20-channel head coil was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil allowed us to maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.

      We have made this clearer in the revised manuscript.

      Methods, Page 20 Lines 385-392

      “All MRI data were acquired using a 20-channel head coil on a research-dedicated 3-Tesla Siemens Magnetom Prisma MRI scanner. Earplugs and cushions were provided for noise protection and head motion restriction. We chose the 20-channel head coil because it was compatible with our EEG system and ensured sufficient signal-to-noise ratio (SNR) for both EEG and fMRI acquisition. The EEG electrode cables were guided through the lateral and posterior openings of the head coil, secured with foam padding to reduce motion and minimize MR-related artifacts. Moreover, given the extended nature of nocturnal sleep recordings, the 20-channel coil helped maintain participant comfort while still achieving high-quality simultaneous EEG-fMRI data.”

      (6) Was the EEG sampling synchronized to the MR scanner (gradient system) clock (the 10 MHz signal; not referring to the volume TTL triggers here)? This is a requirement for stable gradient artifact shape over time and thus accurate gradient noise removal.

      Thank you for raising this important point. We confirm that the EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This synchronization was achieved using the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift. As a result, the gradient artifact waveform remained stable across volumes, allowing for more effective artifact correction during preprocessing. We appreciate your attention to this critical aspect of EEG-fMRI data acquisition.

      We have made this clearer in the revised manuscript.

      Methods, Page 19-20 Lines 371-383

      “EEG was recorded simultaneously with fMRI data using an MR-compatible EEG amplifier system (BrainAmps MR-Plus, Brain Products, Germany), along with a specialized electrode cap. The recording was done using 64 channels in the international 10/20 system, with the reference channel positioned at FCz. In order to adhere to polysomnography (PSG) recording standards, six electrodes were removed from the EEG cap: one for electrocardiogram (ECG) recording, two for electrooculogram (EOG) recording, and three for electromyogram (EMG) recording. EEG data was recorded at a sample rate of 5000 Hz, the resistance of the reference and ground channels was kept below 10 kΩ, and the resistance of the other channels was kept below 20 kΩ. To synchronize the EEG and fMRI recordings, the BrainVision recording software (BrainProducts, Germany) was utilized to capture triggers from the MRI scanner. The EEG sampling was synchronized to the MR scanner’s 10 MHz gradient system clock, ensuring a stable gradient artifact shape over time and enabling accurate artifact removal. This was achieved via the standard clock synchronization interface of the EEG amplifier, minimizing timing jitter and drift.”

      (7) The TR is quite long and the voxel size is quite large in comparison to state-of-the-art EPI sequences. What was the rationale behind choosing a sequence with relatively low temporal and spatial resolution?

      We acknowledge that our chosen TR and voxel size are relatively long and large compared to state-of-the-art EPI sequences. This decision was made to optimize the signal-to-noise ratio (SNR) and reduce susceptibility-related distortions, which are particularly critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. A longer TR allowed us to sample whole-brain activity with sufficient coverage, while a larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures such as the thalamus and hippocampus, which are key regions of interest in our study. We appreciate your concern and hope this clarification provides sufficient rationale for our sequence parameters.

      We have made this clearer in the revised manuscript.

      Methods, Page 20-21 Lines 398-408

      “Then, the “sleep” session began after the participants were instructed to try and fall asleep. For the functional scans, whole-brain images were acquired using k-space and steady-state T2*-weighted gradient echo-planar imaging (EPI) sequence that is sensitive to the BOLD contrast. This measures local magnetic changes caused by changes in blood oxygenation that accompany neural activity (sequence specification: 33 slices in interleaved ascending order, TR = 2000 ms, TE = 30 ms, voxel size = 3.5 × 3.5 × 4.2 mm<sup>3</sup>, FA = 90°, matrix = 64 × 64, gap = 0.7 mm). A relatively long TR and larger voxel size were chosen to optimize SNR and reduce susceptibility-related distortions, which are critical in EEG-fMRI sleep studies where head motion and physiological noise can be substantial. The longer TR allowed whole-brain coverage with sufficient temporal resolution, while the larger voxel size helped enhance BOLD sensitivity and minimize partial volume effects in deep brain structures (e.g., the thalamus and hippocampus), which are key regions of interest in this study.”

      (8) The anatomically defined ROIs are quite large. It should be elaborated on how this might reduce sensitivity to sleep rhythm-specific activity within sub-regions, especially for the thalamus, which has distinct nuclei involved in sleep functions.

      We appreciate your insight regarding the use of anatomically defined ROIs and their potential limitations in detecting sleep rhythm-specific activity within sub-regions, particularly in the thalamus. Given the distinct functional roles of thalamic nuclei in sleep processes, we acknowledge that using a single, large thalamic ROI may reduce sensitivity to localized activity patterns. To address this, we will discuss this limitation in the revised manuscript, acknowledging that our approach prioritizes whole-structure effects but may not fully capture nucleus-specific contributions.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (9) The study reports SO & spindle amplitudes & densities, as well as SO+spindle coupling, to be larger during N2/3 sleep compared to N1 and REM sleep, which is trivial but can be seen as a sanity check of the data. However, the amount of SOs and spindles reported for N1 and REM sleep is concerning, as per definition there should be hardly any (if SOs or spindles occur in N1 it becomes by definition N2, and the interval between spindles has to be considerably large in REM to still be scored as such). Thus, on the one hand, the report of these comparisons takes too much space in the main manuscript as it is trivial, but on the other hand, it raises concerns about the validity of the scoring.

      We appreciate your concern regarding the reported presence of SOs and spindles in N1 and REM sleep and the potential implications. Our detection method for detecting SO, spindle, and coupling were originally designed only for N2&N3 sleep data based on the characteristics of the data itself, and this method is widely recognized and used in the sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). While, because the detection methods for SO and spindle are based on percentiles, this method will always detect a certain number of events when used for other stages (N1 and REM) sleep data, but the differences between these events and those detected in stage N23 remain unclear. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      (10) Why was electrode F3 used to quantify the occurrence of SOs and spindles? Why not a midline frontal electrode like Fz (or a number of frontal electrodes for SOs) and Cz (or a number of centroparietal electrodes) for spindles to be closer to their maximum topography?

      We appreciate your suggestion regarding electrode selection for SO and spindle quantification. Our choice of F3 was primarily based on previous studies (Massimini et al., 2004; Molle et al., 2011), where bilateral frontal electrodes are commonly used for detecting SOs and spindles. Additionally, we considered the impact of MRI-related noise and, after a comprehensive evaluation, determined that F3 provided an optimal balance between signal quality and artifact minimization. We also acknowledge that alternative electrode choices, such as Fz for SOs and Cz for spindles, could provide additional insights into their topographical distributions.

      (11) Functional connectivity (hippocampus -> thalamus -> cortex (mPFC)) is reported to be increased during SO-spindle coupling and interpreted as evidence for coordination of hippocampo-neocortical communication likely by thalamic spindles. However, functional connectivity was only analysed during coupled SO+spindle events, not during isolated SOs or isolated spindles. Without the direct comparison of the connectivity patterns between these three events, it remains unclear whether this is specific for coupled SO+spindle events or rather associated with one or both of the other isolated events. The PPIs need to be conducted for those isolated events as well and compared statistically to the coupled events.

      We appreciate your critical perspective on our functional connectivity analysis and the interpretation of hippocampus-thalamus-cortex (mPFC) interactions during SO-spindle coupling. We acknowledge that, in the current analysis, functional connectivity was only examined during coupled SO-spindle events, without direct comparison to isolated SOs or isolated spindles. To address this concern, we have conducted PPI analyses for all three ROIs(Hippocampus, Thalamus, mPFC) and all three event types (SO-spindle couplings, isolated SOs, and isolated spindles). Our results indicate that neither isolated SOs nor isolated Spindles yielded significant connectivity changes in all three ROIs, as all failed to survive multiple comparison corrections. This suggests that the observed connectivity increase is specific to SO-spindle coupling, rather than being independently driven by either SOs or spindles alone.

      Results, Page 14 Lines 248-255

      “Crucially, the interaction between FC and SO-spindle coupling revealed that only the functional connectivity of hippocampus -> thalamus (ROI analysis, t<sub>(106)</sub> = 1.86, p = 0.0328) and thalamus -> mPFC (ROI analysis, t<sub>(106)</sub> = 1.98, p = 0.0251) significantly increased during SO-spindle coupling, with no significant changes in all other pathways (Fig. 4e). We also conducted PPI analyses for the other two events (SOs and spindles), and neither yielded significant connectivity changes in the three ROIs, as all failed to survive whole-brain FWE correction at the cluster level (p < 0.05). Together, these findings suggest that the thalamus, likely via spindles, coordinates hippocampal-cortical communication selectively during SO-spindle coupling, but not isolated SOs or spindle events alone.”

      (12) The limited temporal resolution of fMRI does indeed not allow for easily distinguishing between fMRI activation patterns related to SO-up- vs. SO-down-states. For this, one could try to extract the amplitudes of SO-up- and SO-down-states separately for each SO event and model them as two separate parametric modulators (with the risk of collinearity as they are likely correlated).

      We appreciate your insightful comment regarding the challenge of distinguishing fMRI activation patterns related to SO-up vs. SO-down states due to the limited temporal resolution of fMRI. While our current analysis does not differentiate between these two phases, we acknowledge that separately modeling SO-up and SO-down states using parametric modulators could provide a more refined understanding of their distinct neural correlates. However, as you notes, this approach carries the risk of collinearity, and there is indeed a high correlation between the two amplitudes across all subjects in our results (r=0.98). Future studies could explore more on leveraging high-temporal-resolution techniques. While implementing this in the current study is beyond our scope, we will acknowledge this limitation in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.

      (13) L327: "It is likely that our findings of diminished DMN activity reflect brain activity during the SO DOWN-state, as this state consistently shows higher amplitude compared to the UP-state within subjects, which is why we modelled the SO trough as its onset in the fMRI analysis." This conclusion is not justified as the fact that SO down-states are larger in amplitude does not mean their impact on the BOLD response is larger.

      We appreciate your concern regarding our interpretation of diminished DMN activity reflecting the SO down-state. We acknowledge that the current expression is somewhat misleading, and our interpretation of it is: it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. And we will make this clear in the Discussion section.

      Discussion, Page 17 Lines 308-322

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      (14) Line 77: "In the current study, while directly capturing hippocampal ripples with scalp EEG or fMRI is difficult, we expect to observe hippocampal activation in fMRI whenever SOs-spindles coupling is detected by EEG, if SOs- spindles-ripples triple coupling occurs during human NREM sleep". Not all SO-spindle events are associated with ripples (Staresina et al., 2015), but hippocampal activation may also be expected based on the occurrence of spindles alone (Bergmann et al., 2012).

      We appreciate your clarification regarding the relationship between SO-spindle coupling and hippocampal ripples. We acknowledge that not all SO-spindle events are necessarily accompanied by ripples (Staresina et al., 2015). However, based on previous research, we found that hippocampal ripples are significantly more likely to occur during SO-spindle coupling events. This suggests that while ripple occurrence is not guaranteed, SO-spindle coupling creates a favorable network state for ripple generation and potential hippocampal activation. To ensure accuracy, we will revise the manuscript to delete this misleading sentence in the Introduction section and acknowledge in the Discussion that our results cannot conclusively directly observe the triple coupling of SO, spindle, and hippocampal ripples.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      Reviewer #2 (Public review):

      In this study, Wang and colleagues aimed to explore brain-wide activation patterns associated with NREM sleep oscillations, including slow oscillations (SOs), spindles, and SO-spindle coupling events. Their findings reveal that SO-spindle events corresponded with increased activation in both the thalamus and hippocampus. Additionally, they observed that SO-spindle coupling was linked to heightened functional connectivity from the hippocampus to the thalamus, and from the thalamus to the medial prefrontal cortex-three key regions involved in memory consolidation and episodic memory processes.

      This study's findings are timely and highly relevant to the field. The authors' extensive data collection, involving 107 participants sleeping in an fMRI while undergoing simultaneous EEG recording, deserves special recognition. If shared, this unique dataset could lead to further valuable insights. While the conclusions of the data seem overall well supported by the data, some aspects with regard to the detection of sleep oscillations need clarification.

      The authors report that coupled SO-spindle events were most frequent during NREM sleep (2.46 [plus minus] 0.06 events/min), but they also observed a surprisingly high occurrence of these events during N1 and REM sleep (2.23 [plus minus] 0.09 and 2.32 [plus minus] 0.09 events/min, respectively), where SO-spindle coupling would not typically be expected. Combined with the relatively modest SO amplitudes reported (~25 µV, whereas >75 µV would be expected when using mastoids as reference electrodes), this raises the possibility that the parameters used for event detection may not have been conservative enough - or that sleep staging was inaccurately performed. This issue could present a significant challenge, as the fMRI findings are largely dependent on the reliability of these detected events.

      Thank you very much for your thorough and encouraging review. We appreciate your recognition of the significance and relevance of our study and dataset, particularly in highlighting how simultaneous EEG-fMRI recordings can provide complementary insights into the temporal dynamics of neural oscillations and their associated spatial activation patterns during sleep. In the sections that follow, we address each of your comments in detail. We have revised the text and conducted additional analyses wherever possible to strengthen our argument, clarify our methodological choices. We believe these revisions improve the clarity and rigor of our work, and we thank you for helping us refine it.

      We appreciate your insightful comments regarding the detection of sleep oscillations. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM. We will acknowledge the reasons for these results in the Methods section and emphasize that they are used only for sanity checks.

      Regarding the reported SO amplitudes (~25 µV), during preprocessing, we applied the Signal Space Projection (SSP) method to more effectively remove MRI gradient artifacts and cardiac pulse noise. While this approach enhances data quality, it also reduces overall signal power, leading to systematically lower reported amplitudes. Despite this, our SO detection in NREM sleep (especially N2/N3) remain physiologically meaningful and are consistent with previous fMRI studies using similar artifact removal techniques. We appreciate your careful evaluation and valuable suggestions.

      In addition, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics (Table S1), as well as detailed information about sleep waves at each sleep stage for all 107 subjects(Table S2-S4), listing for each subject:(1)Different sleep stage duration; (2)Number of detected SOs; (3)Number of detected spindles; (4)Number of detected SO-spindle coupling events; (2)Density of detected SOs; (3)Density of detected spindles; (4)Density of detected SO-spindle coupling events.

      Methods, Page 25 Lines 515-524

      “We note that the above methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).”

      Supplementary Materials, Page 42-54, Table S1-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      Reviewer #3 (Public review):

      Summary:

      Wang et al., examined the brain activity patterns during sleep, especially when locked to those canonical sleep rhythms such as SO, spindle, and their coupling. Analyzing data from a large sample, the authors found significant coupling between spindles and SOs, particularly during the upstate of the SO. Moreover, the authors examined the patterns of whole-brain activity locked to these sleep rhythms. To understand the functional significance of these brain activities, the authors further conducted open-ended cognitive state decoding and found a variety of cognitive processing may be involved during SO-spindle coupling and during other sleep events. The authors next investigated the functional connectivity analyses and found enhanced connectivity between the hippocampus, the thalamus, and the medial PFC. These results reinforced the theoretical model of sleep-dependent memory consolidation, such that SO-spindle coupling is conducive to systems-level memory reactivation and consolidation.

      Strengths:

      There are obvious strengths in this work, including the large sample size, state-of-the-art neuroimaging and neural oscillation analyses, and the richness of results.

      Weaknesses:

      Despite these strengths and the insights gained, there are weaknesses in the design, the analyses, and inferences.

      Thank you for your detailed and thoughtful review of our manuscript. We are delighted that you recognize our advanced analysis methods and rich results of neuroimaging and neural oscillations as well as the large sample size data. In the following sections, we provide detailed responses to each of your comments. And we have revised the text and conducted additional analyses to strengthen our arguments and clarify our methodological choices. We believe these revisions enhance the clarity and rigor of our work, and we sincerely appreciate your thoughtful feedback in helping us refine the manuscript.

      (1) A repeating statement in the manuscript is that brain activity could indicate memory reactivation and thus consolidation. This is indeed a highly relevant question that could be informed by the current data/results. However, an inherent weakness of the design is that there is no memory task before and after sleep. Thus, it is difficult (if not impossible) to make a strong argument linking SO/spindle/coupling-locked brain activity with memory reactivation or consolidation.

      We appreciate your suggestion regarding the lack of a pre- and post-sleep memory task in our study design. We acknowledge that, in the absence of behavioral measures, it is hard to directly link SO-spindle coupling to memory consolidation in an outcome-driven manner. Our interpretation is instead based on the well-established role of these oscillations in memory processes, as demonstrated in previous studies. We sincerely appreciate this feedback and will adjust our Discussion accordingly to reflect a more precise interpretation of our findings.

      Discussion, Page 18 Lines 333-341

      “Despite providing new insights, our study has several limitations. First, our scalp EEG did not directly capture hippocampal ripples, preventing us from conclusively demonstrating triple coupling. Second, the combination of EEG-fMRI and the lack of a memory task limit our ability to parse fine-grained BOLD responses at the DOWN- vs. UP-states of SOs and link observed activations to behavioral outcomes. Third, the use of large anatomical ROIs may mask subregional contributions of specific thalamic nuclei or hippocampal subfields. Finally, without a memory task, we cannot establish a direct behavioral link between sleep-rhythm-locked activation and memory consolidation. Future studies combining techniques such as ultra-high-field fMRI or iEEG with cognitive tasks may refine our understanding of subregional network dynamics and functional significance during sleep.”

      (2) Relatedly, to understand the functional implications of the sleep rhythm-locked brain activity, the authors employed the "open-ended cognitive state decoding" method. While this method is interesting, it is rather indirect given that there were no behavioral indices in the manuscript. Thus, discussions based on these analyses are speculative at best. Please either tone down the language or find additional evidence to support these claims.

      Moreover, the results from this method are difficult to understand. Figure 3e showed that for all three types of sleep events (SO, spindle, SO-spindle), the same mental states (e.g., working memory, episodic memory, declarative memory) showed opposite directions of activation (left and right panels showed negative and positive activation, respectively). How to interpret these conflicting results? This ambiguity is also reflected by the term used: declarative memory and episodic memories are both indexed in the results. Yet these two processes can be largely overlapped. So which specific memory processes do these brain activity patterns reflect? The Discussion shall discuss these results and the limitations of this method.

      We appreciate your critical assessment of the open-ended cognitive state decoding method and its interpretational challenges. Given the concerns about the indirectness of this approach, we decided to remove its related content and results from Figure 3 in the main text and include it in Supplementary Figure 7.

      Due to the complexity of memory-related processes, we acknowledge that distinguishing between episodic and declarative memory based solely on this approach is not straightforward. We will revise the Supplementary Materials to explicitly discuss these limitations and clarify that our findings do not isolate specific cognitive processes but rather suggest general associations with memory-related networks.

      Discussion, Page 17-18 Lines 323-332

      “To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potenial functional claims.”

      (3) The coupling strength is somehow inconsistent with prior results (Hahn et al., 2020, eLife, Helfrich et al., 2018, Neuron). Specifically, Helfrich et al. showed that among young adults, the spindle is coupled to the peak of the SO. Here, the authors reported that the spindles were coupled to down-to-up transitions of SO and before the SO peak. It is possible that participants' age may influence the coupling (see Helfrich et al., 2018). Please discuss the findings in the context of previous research on SO-spindle coupling.

      We appreciate your concern regarding the temporal characteristics of SO-spindle coupling. We acknowledge that the SO-spindle coupling phase results in our study are not identical to those reported by Hahn et al. (2020); Helfrich et al. (2018). However, these differences may arise due to slight variations in event detection parameters, which can influence the precise phase estimation of coupling. Notably, Hahn et al. (2020) also reported slight discrepancies in their group-level coupling phase results, highlighting that methodological differences can contribute to variability across studies. Furthermore, our findings are consistent with those of Schreiner et al. (2021), further supporting the robustness of our observations.

      That said, we acknowledge that our original description of SO-spindle coupling as occurring at the "transition from the lower state to the upper state" was not entirely precise. The -π/2 phase represents the true transition point, while our observed coupling phase is actually closer to the SO peak rather than strictly at the transition. We will revise this statement in the manuscript to ensure clarity and accuracy in describing the coupling phase.

      Discussion, Page 16 Lines 283-291

      “Our data provide insights into the neurobiological underpinnings of these sleep rhythms. SOs, originating mainly in neocortical areas such as the mPFC, alternate between DOWN- and UP-states. The thalamus generates sleep spindles, which in turn couple with SOs. Our finding that spindle peaks consistently occurred slightly before the UP-state peak of SOs (in 83 out of 107 participants), concurs with prior studies, including Schreiner et al. (2021). Yet it differs from some results suggesting spindles might peak right at the SO UP-state (Hahn et al., 2020; Helfrich et al., 2018). Such discrepancies could arise from differences in detection algorithms, participant age (Helfrich et al., 2018), or subtle variations in cortical-thalamic timing. Nonetheless, these results underscore the importance of coordinated SO-spindle interplay in supporting sleep-dependent processes.”

      (4) The discussion is rather superficial with only two pages, without delving into many important arguments regarding the possible functional significance of these results. For example, the author wrote, "This internal processing contrasts with the brain patterns associated with external tasks, such as working memory." Without any references to working memory, and without delineating why WM is considered as an external task even working memory operations can be internal. Similarly, for the interesting results on SO and reduced DMN activity, the authors wrote "The DMN is typically active during wakeful rest and is associated with self-referential processes like mind-wandering, daydreaming, and task representation (Yeshurun, Nguyen, & Hasson, 2021). Its reduced activity during SOs may signal a shift towards endogenous processes such as memory consolidation." This argument is flawed. DMN is active during self-referential processing and mind-wandering, i.e., when the brain shifts from external stimuli processing to internal mental processing. During sleep, endogenous memory reactivation and consolidation are also part of the internal mental processing given the lack of external environmental stimulation. So why during SO or during memory consolidation, the DMN activity would be reduced? Were there differences in DMN activity between SO and SO-spindle coupling events?

      We appreciate your concerns regarding the brevity of the discussion and the need for clearer theoretical arguments. We will expand this section to provide more in-depth interpretations of our findings in the context of prior literature. Regarding working memory (WM), we acknowledge that our phrasing was ambiguous. We will modify this statement in the Discussion section.

      For the SO-related reduction in DMN activity, we recognize the need for a more precise explanation. This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state.

      To address your final question, we have conducted the additional post hoc comparison of DMN activity between isolated SOs and SO-spindle coupling events. Our results indicate that

      DMN activation during SOs was significantly lower than during SO-spindle coupling (t<sub>(106)</sub> = -4.17, p < 1e-4). This suggests that SO-spindle coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. We appreciate your constructive feedback and will integrate these expanded analyses and discussions into our revised manuscript.

      Results, Page 11 Lines 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Discussion, Page 17-18 Lines 308-332

      “An intriguing aspect of our findings is the reduced DMN activity during SOs when modeled at the SO trough (DOWN-state). This reduced DMN activity may reflect large-scale neural inhibition characteristic of the SO trough. The DMN is typically active during internally oriented cognition (e.g., self-referential processing or mind-wandering) and is suppressed during external stimuli processing (Yeshurun, Nguyen, & Hasson, 2021). It is unlikely, however, that this suppression of DMN during SO events is related to a shift from internal cognition to external responses given it is during deep sleep time. Instead, it could be driven by the inherent rhythmic pattern of SOs, which makes it difficult to separate UP- from DOWN-states (the two temporal regressors were highly correlated, and similar brain activation during SOs events was obtained if modelled at the SO peak instead, Fig. S5). Since the amplitude at the SO trough is consistently larger than that at the SO peak, the neural activation we detected may primarily capture the large-scale inhibition from DOWN-state. Interestingly, no such DMN reduction was found during SO-spindle coupling, implying that coupling may involve distinct neural dynamics that partially re-engage DMN-related processes, possibly reflecting memory-related reactivation. Future research using high-temporal-resolution techniques like iEEG could clarify these possibilities.

      To explore functional relevance, we employed an open-ended cognitive state decoding approach using meta-analytic data (NeuroSynth: Yarkoni et al. (2011)). Although this method usefully generates hypotheses about potential cognitive processes, particularly in the absence of a pre- and post-sleep memory task, it is inherently indirect. Many cognitive terms showed significant associations (16 of 50), such as “episodic memory,” “declarative memory,” and “working memory.” We focused on episodic/declarative memory given the known link with hippocampal reactivation (Diekelmann & Born, 2010; Staresina et al., 2015; Staresina et al., 2023). Nonetheless, these inferences regarding memory reactivation should be interpreted cautiously without direct behavioral measures. Future research incorporating explicit tasks before and after sleep would more rigorously validate these potential functional claims.”

      Reviewing Editor Comment:

      The reviewers think that you are working on a relevant and important topic. They are praising the large sample size used in the study. The reviewers are not all in line regarding the overall significance of the findings, but they all agree the paper would strongly benefit from some extra work, as all reviewers raise various critical points that need serious consideration.

      We appreciate your recognition of the relevance and importance of our study, as well as your acknowledgment of the large sample size as a strength of our work. We understand that there are differing perspectives regarding the overall significance of our findings, and we value the constructive critiques provided. We are committed to addressing the key concerns raised by all reviewers, including refining our analyses, clarifying our interpretations, and incorporating additional discussions to strengthen the manuscript. Below, we address your specific recommendations and provide responses to each point you raised to ensure our methods and results are as transparent and comprehensible as possible. We believe that these revisions will significantly enhance the rigor and impact of our study, and we sincerely appreciate your thoughtful feedback in helping us improve our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "overnight sleep" suggests an entire night, while these were rather "nocturnal naps". Please rephrase.

      Thank you for pointing this out. We have revised the phrasing in our manuscript to "nocturnal naps" instead of "overnight sleep" to more accurately reflect the duration of the sleep recordings.

      (2) Sleep staging results (macroscopic sleep architecture) should be provided in more detail (at least min and % of the different sleep stages, sleep onset latency, total sleep duration, total recording duration), at least mean/SD/range.

      Thank you for this suggestion. We will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics. This information will help provide a clearer overview of the macroscopic sleep architecture in our dataset.

      Supplementary Materials, Page 42, Table S1

      Author response table 1.

      Descriptive results of demographic information and sleep characteristics. Note: The total recorded time is equal to the awake time plus the total sleep time. The sleep onset latency is the time taken to reach the first sleep epoch. The Sleep Efficiency is the ratio of actual sleep time to total recording time.

      Reviewer #2 (Recommendations for the authors):

      In order to allow for a better estimation of the reliability of the detected sleep events, please:

      (1) Provide densities and absolute numbers of all detected SOs and spindles (N1, NREM, and REM sleep).

      Thank you for pointing this out. We will provide comprehensive tables in the supplementary materials, contains detailed information about sleep waves at each sleep stage for all 107 subjects (Table S2-S4), listing for each subject:1) Different sleep stage duration; 2) Number of detected SOs; 3) Number of detected spindles; 4) Number of detected SO-spindle coupling events; 5) Density of detected SOs; 6) Density of detected spindles; 7) Density of detected SO-spindle coupling events.

      Supplementary Materials, Page 43-54, Table S2-S4

      (Consider of the length, we do not list all the tables here. Please refer to the revised manuscript.)

      (2) Show ERPs for all detected SOs and spindles (per sleep stage).

      Thank you for the suggestion. We will provide ERPs for all detected SOs and spindles, separated by sleep stage (N1, N2&N3, and REM) in supplementary Fig. S2-S4. These ERP waveforms will help illustrate the characteristic temporal profiles of SOs and spindles across different sleep stages.

      Methods, Page 25, Line 525-532

      “Event-related potentials (ERP) analysis. After completing the detection of each sleep rhythm event, we performed ERP analyses for SOs, spindles, and coupling events in different sleep stages. Specifically, for SO events, we took the trough of the DOWN-state of each SO as the zero-time point, then extracted data in a [-2 s to 2 s] window from the broadband (0.1–30 Hz) EEG and used [-2 s to -0.5 s] for baseline correction; the results were then averaged across 107 subjects (see Fig. S2a). For spindle events, we used the peak of each spindle as the zero-time point and applied the same data extraction window and baseline correction before averaging across 107 subjects (see Fig. S2b). Finally, for SO-spindle coupling events, we followed the same procedure used for SO events (see Fig. 2a, Figs. S3–S4).”

      Supplementary Materials, Page 36-38, Fig. S2-S4

      Author response image 1.

      ERPs of SOs and spindles coupling during different sleep stages across all 107 subjects. a. ERP of SOs in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the trough of the DOWN-state of each SO at time zero (see Methods for details). The orange line represents the SO ERP in the N1 stage, the black line represents the SO ERP in the N2&N3 stage, and the green line represents the SO ERP in the REM stage. b. ERP of spindles in different sleep stages using the broadband (0.1–30 Hz) EEG data. We align the peak of each spindle at time zero (see Methods for details). The color scheme is the same as in panel a.

      Author response image 2.

      ERP and time-frequency patterns of SO-spindle coupling in the N1 stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, following the same procedure as in Fig. 2a, but for N1 stage.

      Author response image 3.

      ERP and time-frequency patterns of SO-spindle coupling in the REM stage. The averaged temporal frequency pattern and ERP across all instances of SO-spindle coupling, computed over all subjects, again following the same procedure as in Fig. 2a, but for REM stage.

      (3) Provide detailed info concerning sleep characteristics (time spent in each sleep stage etc.).

      Thank you for this suggestion. Same as the response above, we will provide comprehensive tables in the supplementary materials, contains descriptive information about sleep-related characteristics.

      Supplementary Materials, Page 42, Table S1 (same as above)

      (4) What would happen if more stringent parameters were used for event detection? Would the authors still observe a significant number of SO spindles during N1 and REM? Would this affect the fMRI-related results?

      Thank you for this suggestion. Our methods for detecting SOs, spindles, and their couplings were originally developed for N2 and N3 sleep data, based on the specific characteristics of these stages. These methods are widely recognized in sleep research (Hahn et al., 2020; Helfrich et al., 2019; Helfrich et al., 2018; Ngo, Fell, & Staresina, 2020; Schreiner et al., 2022; Schreiner et al., 2021; Staresina et al., 2015; Staresina et al., 2023). However, because this percentile-based detection approach will inherently identify a certain number of events if applied to other stages (e.g., N1 and REM), the nature of these events in those stages remains unclear compared to N2/N3. We nevertheless identified and reported the detailed descriptive statistics of these sleep rhythms in all sleep stages, under the same operational definitions, both for completeness and as a sanity check. Within the same subject, there should be more SOs, spindles, and their couplings in N2/N3 than in N1 or REM (see also Figure S2-S4, Table S1-S4).

      Furthermore, in order to explore the impact of this on our fMRI results, we conducted an additional sensitivity analysis by applying different detection parameters for SOs. Specifically, we adjusted amplitude percentile thresholds for SO detection (the parameter that has the greatest impact on the results). We used the hippocampal activation value during N2&N3 stage SO-spindle coupling as an anchor value and found that when the parameters gradually became stricter, the results were similar to or even better than the current results. However, when we continued to increase the threshold, the results began to gradually decrease until the threshold was increased to 80%, and the results were no longer significant. This indicates that our results are robust within a specific range of parameters, but as the threshold increases, the number of trials decreases, ultimately weakening the statistical power of the fMRI analysis.

      Thank you again for your suggestions on sleep rhythm event detection. We will add the results in Supplementary and revise our manuscript accordingly.

      Results, Page 11, Line 199-208

      “Spindles were correlated with positive activation in the thalamus (ROI analysis, t<sub>(106)</sub> = 15.39, p < 1e-4), the anterior cingulate cortex (ACC), and the putamen, alongside deactivation in the DMN (Fig. 3c). Notably, SO-spindle coupling was linked to significant activation in both the thalamus (ROI analysis, t<sub>(106)</sub> \= 3.38, p = 0.0005) and the hippocampus (ROI analysis, t<sub>(106)</sub> \= 2.50, p = 0.0070, Fig. 3d). However, no decrease in DMN activity was found during SO-spindle coupling, and DMN activity during SO was significantly lower than during coupling (ROI analysis, t<sub>(106)</sub> \= -4.17, p < 1e-4). For more detailed activation patterns, see Table S5-S7. We also varied the threshold used to detect SO events to assess its effect on hippocampal activation during SO-spindle coupling and observed that hippocampal activation remained significant when the percentile thresholds for SO detection ranged between 71% and 80% (see Fig. S6).”

      Supplementary Materials, Page 40, Fig. S6

      Author response image 4.

      Influence of the percentile threshold for SO detection on hippocampal activation (ROI) during SO-spindle coupling. We changed the percentile threshold for SO event detection in the EEG data analysis and then reconstructed the GLM design matrix based on the SO events detected at each threshold. The brain-wide activation pattern of SO-spindle couplings in the N2/3 stage was extracted using the same method as shown in Fig. 3. The gray horizontal line represents the significant range (71%–80%). * p < 0.05.

      Finally, we sincerely thank all again for your thoughtful and constructive feedback. Your insights have been invaluable in refining our analyses, strengthening our interpretations, and improving the clarity and rigor of our manuscript. We appreciate the time and effort you have dedicated to reviewing our work, and we are grateful for the opportunity to enhance our study based on your recommendations.

      References:

      Bergmann, T. O., Mölle, M., Diedrichs, J., Born, J., & Siebner, H. R. (2012). Sleep spindle-related reactivation of category-specific cortical regions after learning face-scene associations. NeuroImage, 59(3), 2733-2742.

      Buzsáki, G. (2015). Hippocampal sharp wave‐ripple: A cognitive biomarker for episodic memory and planning. Hippocampus, 25(10), 1073-1188.

      Caporro, M., Haneef, Z., Yeh, H. J., Lenartowicz, A., Buttinelli, C., Parvizi, J., & Stern, J. M. (2012). Functional MRI of sleep spindles and K-complexes. Clinical neurophysiology, 123(2), 303-309.

      Coulon, P., Budde, T., & Pape, H.-C. (2012). The sleep relay—the role of the thalamus in central and decentral sleep regulation. Pflügers Archiv-European Journal of Physiology, 463, 53-71.

      Crunelli, V., Lőrincz, M. L., Connelly, W. M., David, F., Hughes, S. W., Lambert, R. C., Leresche, N., & Errington, A. C. (2018). Dual function of thalamic low-vigilance state oscillations: rhythm-regulation and plasticity. Nature Reviews Neuroscience, 19(2), 107-118.

      Czisch, M., Wehrle, R., Stiegler, A., Peters, H., Andrade, K., Holsboer, F., & Sämann, P. G. (2009). Acoustic oddball during NREM sleep: a combined EEG/fMRI study. PloS one, 4(8), e6749.

      Diba, K., & Buzsáki, G. (2007). Forward and reverse hippocampal place-cell sequences during ripples. Nature Neuroscience, 10(10), 1241.

      Diekelmann, S., & Born, J. (2010). The memory function of sleep. Nature Reviews Neuroscience, 11(2), 114-126.

      Fogel, S., Albouy, G., King, B. R., Lungu, O., Vien, C., Bore, A., Pinsard, B., Benali, H., Carrier, J., & Doyon, J. (2017). Reactivation or transformation? Motor memory consolidation associated with cerebral activation time-locked to sleep spindles. PloS one, 12(4), e0174755.

      Hahn, M. A., Heib, D., Schabus, M., Hoedlmoser, K., & Helfrich, R. F. (2020). Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9, e53730.

      Halassa, M. M., Siegle, J. H., Ritt, J. T., Ting, J. T., Feng, G., & Moore, C. I. (2011). Selective optical drive of thalamic reticular nucleus generates thalamic bursts and cortical spindles. Nature Neuroscience, 14(9), 1118-1120.

      Hale, J. R., White, T. P., Mayhew, S. D., Wilson, R. S., Rollings, D. T., Khalsa, S., Arvanitis, T. N., & Bagshaw, A. P. (2016). Altered thalamocortical and intra-thalamic functional connectivity during light sleep compared with wake. NeuroImage, 125, 657-667.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J., & Knight, R. T. (2019). Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T., & Walker, M. P. (2018). Old brains come uncoupled in sleep: slow wave-spindle synchrony, brain atrophy, and forgetting. Neuron, 97(1), 221-230. e224.

      Horovitz, S. G., Fukunaga, M., de Zwart, J. A., van Gelderen, P., Fulton, S. C., Balkin, T. J., & Duyn, J. H. (2008). Low frequency BOLD fluctuations during resting wakefulness and light sleep: A simultaneous EEG‐fMRI study. Human brain mapping, 29(6), 671-682.

      Huang, Q., Xiao, Z., Yu, Q., Luo, Y., Xu, J., Qu, Y., Dolan, R., Behrens, T., & Liu, Y. (2024). Replay-triggered brain-wide activation in humans. Nature Communications, 15(1), 7185.

      Ilhan-Bayrakcı, M., Cabral-Calderin, Y., Bergmann, T. O., Tüscher, O., & Stroh, A. (2022). Individual slow wave events give rise to macroscopic fMRI signatures and drive the strength of the BOLD signal in human resting-state EEG-fMRI recordings. Cerebral Cortex, 32(21), 4782-4796.

      Laufs, H. (2008). Endogenous brain oscillations and related networks detected by surface EEG‐combined fMRI. Human brain mapping, 29(7), 762-769.

      Laufs, H., Walker, M. C., & Lund, T. E. (2007). ‘Brain activation and hypothalamic functional connectivity during human non-rapid eye movement sleep: an EEG/fMRI study’—its limitations and an alternative approach. Brain, 130(7), e75.

      Margulies, D. S., Ghosh, S. S., Goulas, A., Falkiewicz, M., Huntenburg, J. M., Langs, G., Bezgin, G., Eickhoff, S. B., Castellanos, F. X., & Petrides, M. (2016). Situating the default-mode network along a principal gradient of macroscale cortical organization. Proceedings of the National Academy of Sciences, 113(44), 12574-12579.

      Massimini, M., Huber, R., Ferrarelli, F., Hill, S., & Tononi, G. (2004). The sleep slow oscillation as a traveling wave. Journal of Neuroscience, 24(31), 6862-6870.

      Moehlman, T. M., de Zwart, J. A., Chappel-Farley, M. G., Liu, X., McClain, I. B., Chang, C., Mandelkow, H., Özbay, P. S., Johnson, N. L., & Bieber, R. E. (2019). All-night functional magnetic resonance imaging sleep studies. Journal of neuroscience methods, 316, 83-98.

      Molle, M., Bergmann, T. O., Marshall, L., & Born, J. (2011). Fast and slow spindles during the sleep slow oscillation: disparate coalescence and engagement in memory processing. Sleep, 34(10), 1411-1421.

      Ngo, H.-V., Fell, J., & Staresina, B. (2020). Sleep spindles mediate hippocampal-neocortical coupling during long-duration ripples. Elife, 9, e57011.

      Picchioni, D., Horovitz, S. G., Fukunaga, M., Carr, W. S., Meltzer, J. A., Balkin, T. J., Duyn, J. H., & Braun, A. R. (2011). Infraslow EEG oscillations organize large-scale cortical– subcortical interactions during sleep: a combined EEG/fMRI study. Brain research, 1374, 63-72.

      Schabus, M., Dang-Vu, T. T., Albouy, G., Balteau, E., Boly, M., Carrier, J., Darsaud, A., Degueldre, C., Desseilles, M., & Gais, S. (2007). Hemodynamic cerebral correlates of sleep spindles during human non-rapid eye movement sleep. Proceedings of the National Academy of Sciences, 104(32), 13164-13169.

      Schreiner, T., Kaufmann, E., Noachtar, S., Mehrkens, J.-H., & Staudigl, T. (2022). The human thalamus orchestrates neocortical oscillations during NREM sleep. Nature communications, 13(1), 5231.

      Schreiner, T., Petzka, M., Staudigl, T., & Staresina, B. P. (2021). Endogenous memory reactivation during sleep in humans is clocked by slow oscillation-spindle complexes. Nature Communications, 12(1), 3112.

      Singh, D., Norman, K. A., & Schapiro, A. C. (2022). A model of autonomous interactions between hippocampus and neocortex driving sleep-dependent memory consolidation. Proceedings of the National Academy of Sciences, 119(44), e2123432119.

      Spoormaker, V. I., Schröter, M. S., Gleiser, P. M., Andrade, K. C., Dresler, M., Wehrle, R., Sämann, P. G., & Czisch, M. (2010). Development of a large-scale functional brain network during human non-rapid eye movement sleep. Journal of Neuroscience, 30(34), 11379-11387.

      Staresina, B. P., Bergmann, T. O., Bonnefond, M., van der Meij, R., Jensen, O., Deuker, L., Elger, C. E., Axmacher, N., & Fell, J. (2015). Hierarchical nesting of slow oscillations, spindles and ripples in the human hippocampus during sleep. Nature Neuroscience, 18(11), 1679-1686.

      Staresina, B. P., Niediek, J., Borger, V., Surges, R., & Mormann, F. (2023). How coupled slow oscillations, spindles and ripples coordinate neuronal processing and communication during human sleep. Nature Neuroscience, 1-9.

      Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. Nature methods, 8(8), 665-670.

      Yeshurun, Y., Nguyen, M., & Hasson, U. (2021). The default mode network: where the idiosyncratic self meets the shared social world. Nature Reviews Neuroscience, 1-12.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-of-the-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings. Moreover, it enables the visualization of actual cell locations, allowing for the examination of spatial properties (e.g., Figure 4G).

      We thank the reviewer for pointing out the technical novelty of this work.

      Weaknesses:

      There is a notable deviation from several observations obtained through conventional electrophysiological recordings. Particularly, as mentioned below in detail, the considerable differences in baseline firing rates and no observations of ripple-triggered firing patterns raise some concerns about potential artifacts from imaging and analysis, such as cell toxicity, abnormal excitability, and false detection of spikes. While these findings are intriguing if the validity of these methods is properly proven, accepting the current results as new insights is challenging.

      We appreciate the reviewer’s insightful comments regarding the intriguing aspect of our findings. Indeed, the emergence of a novel form of CA1 population synchrony presents exciting implications for hippocampal memory research and beyond.

      While we acknowledge the deviations from conventional electrophysiological recordings, we respectfully contend that these differences do not necessarily imply methodological flaws. All experiments and analyses were conducted with meticulous adherence to established standards in the field.

      Regarding the observed variations in averaging firing rates, it is important to note the well-documented heterogeneity in CA1 pyramidal neuron firing rates, spanning from 0.01 to 10 Hz, with a skewed distribution toward lower frequencies (Mizuseki et al., 2013). Our exclusion criteria for neurons with low estimated firing rates may have inadvertently biased the selection towards more active neurons. Moreover, prior research has indicated that averaging firing rates tend to increase during exposure to novel environments (Karlsson et al., 2008), and among deep-layer CA1 pyramidal neurons (Mizuseki et al., 2011). Given our recording setup in a highly novel environment and the predominance of deep CA1 pyramidal neurons in our sample, the observed higher averaging firing rates could be influenced by these factors. Considering these points, our mean firing rates (3.2 Hz) are reasonable estimations compared to previously reported values obtained from electrophysiological recordings (2.1 Hz in McHugh et al., 1996 and 2.4-2.6 Hz in Buzsaki et al., 2003).

      Regarding concerns about potential cell toxicity, previous studies have shown that Voltron expression and illumination do not significantly alter membrane resistance, membrane capacitance, resting membrane potentials, spike amplitudes, and spike width (see Abdelfattah 2019, Science, Supplementary Figure 11 and 12). In our recordings, imaged neurons exhibit preserved membrane and dendritic morphology during and after experiments (Author response image 1), supporting the absence of significant toxicity.

      Author response image 1.

      Voltron-expressing neurons exhibit preserved membrane and dendritic morphology. (A) Images of two-photon z-stack maximum intensity projection showing Voltron-expressing neurons taken after voltage image experiments in vivo. (B) Post-hoc histological images of neurons being voltage-imaged.

      Regarding spike detection, we use validated algorithms (Abdelfattah et al., 2019 and 2023) to ensure robust and reliable detection of spikes. Spiking activity was first separated from slower subthreshold potentials using high-pass filtering. This way, a slow fluorescence increase will not be detected as a spike, even if its amplitude is large. We benchmarked the detection algorithm in computer simulation. The sensitivity and specificity of the algorithm exceed 98% at the level of signal-to-noise ratio of our recordings. While we acknowledge that a small number of spikes, particularly those occurring later in a burst, might be missed due to their smaller amplitudes (as illustrated in Figure 1 and 2 of the manuscript), we anticipate that any missed spikes would lead to a decrease rather than an increase in synchrony between neurons. Overall, we are confident that spike detection is performed in a rigorous and robust manner.

      To further strengthen these points, we will include the following in the revision:

      (1) Histological images of recorded neurons during and after experiments.

      (2) Further details regarding the validation of spike detection algorithms.

      (3) Analysis of publicly available electrophysiological datasets.

      (4) Discussion regarding the reasons behind the novelty of some of our findings compared to previous observations.

      In conclusion, we assert that our experimental and analysis approach upholds rigorous standards. We remain committed to reconciling our findings with previous observations and welcome further scrutiny and engagement from the scientific community to explore the intriguing implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased-locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      We thank the reviewer for a thorough and thoughtful review of our paper.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single-cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population-level activity in CA1.

      We thank the reviewer for pointing out the technical strength and the novelty of our observations.

      Weaknesses:

      The evidence provided is weak, with the authors making surprising population-level claims based on a very sparse data set (5 data sets, each with less than 20 neurons simultaneously recorded) acquired with exciting, but less tested technology. Further, while the authors link these observations to the novelty of the context, both in the title and text, they do not include data from subsequent visits to support this. Detailed comments are below:

      We understand the reviewer’s concerns regarding the size of the dataset. Despite this limitation, it is important to note that synchronous ensembles beyond what could be expected from chance (jittering) were detected in all examined data. In the revision, we plan to add more data, including data from subsequent visits, to further strengthen our findings.

      (1) My first question for the authors, which is not addressed in the discussion, is why these events have not been observed in the countless extracellular recording experiments conducted in rodent CA1 during the exploration of novel environments. Those data sets often have 10x the neurons simultaneously recording compared to these present data, thus the highly synchronous firing should be very hard to miss. Ideally, the authors could confirm their claims via the analysis of publicly available electrophysiology data sets. Further, the claim of high extra-SWR synchrony is complicated by the observation that their recorded neurons fail to spike during the limited number of SWRs recorded during behavior- again, not agreeing with much of the previous electrophysiological recordings.

      We understand the reviewer’s concern. We will examine publicly available electrophysiology datasets to gain further insights into any similarities and differences to our findings. Based on these results, we will discuss why these events have not been previously observed/reported.

      (2) The authors posit that these events are linked to the novelty of the context, both in the text, as well as in the title and abstract. However, they do not include any imaging data from subsequent days to demonstrate the failure to see this synchrony in a familiar environment. If these data are available it would strengthen the proposed link to novelty if they were included.

      We thank the reviewer’s constructive suggestion. We will acquire more datasets from subsequent visits to gain further insights into these synchronous events.

      3) In the discussion the authors begin by speculating the theta present during these synchronous events may be slower type II or attentional theta. This can be supported by demonstrating a frequency shift in the theta recording during these events/immobility versus the theta recording during movement.

      We thank the reviewer’s constructive suggestion. We did demonstrate a frequency shift to a lower frequency in the synchrony-associated theta during immobility than during locomotion (see Fig. 4B, the red vs. blue curves). We will enlarge this panel and specifically refer to it in the corresponding discussion paragraph.

      (4) The authors mention in the discussion that they image deep-layer PCs in CA1, however, this is not mentioned in the text or methods. They should include data, such as imaging of a slice of a brain post-recording with immunohistochemistry for a layer-specific gene to support this.

      We thank the reviewer’s constructive suggestion. We do have images of brain slices post-recordings (Author response image 2). Imaged neurons are clearly located in the deep CA1 pyramidal layer. We will add these images and quantification in the revised manuscript.

      Author response image 2.

      Imaged neurons are located in the deep pyramidal layer of the dorsal hippocampal CA1 region.

      Reviewer #3 (Public Review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected in the other side of the brain, and the investigation is flawed due to multiple problems with the point process analyses. The synchrony terminology refers to dozens of milliseconds as opposed to the millisecond timescale referred to in prior work, and the interpretations do not take into account theta phase locking as a simple alternative explanation.

      We genuinely appreciate the reviewer’s feedback and acknowledge the concerns raised. However, we believe these concerns can be effectively addressed without undermining the validity of our conclusions. With this in mind, we respectfully disagree with the assessment that our experiments and investigation are flawed. Please allow us to address these concerns and offer additional context to support the validity of our study.

      Weaknesses:

      The two main messages of the manuscript indicated in the title are not supported by the data. The title gives two messages that relate to CA1 pyramidal neurons in behaving head-fixed mice: (1) synchronous ensembles are associated with theta (2) synchronous ensembles are not associated with ripples.

      There are two main methodological problems with the work:

      (1) Experimentally, the theta and ripple signals were recorded using electrophysiology from the opposite hemisphere to the one in which the spiking was monitored. However, both signals exhibit profound differences as a function of location: theta phase changes with the precise location along the proximo-distal and dorso-ventral axes, and importantly, even reverses with depth. And ripples are often a local phenomenon - independent ripples occur within a fraction of a millimeter within the same hemisphere, let alone different hemispheres. Ripples are very sensitive to the precise depth - 100 micrometers up or down, and only a positive deflection/sharp wave is evident.

      We appreciate the reviewer’s consideration regarding the collection of LFP from the contralateral hemisphere. While we acknowledge the limitation of this design, we believe that our findings still offer valuable insights into the dynamics of synchronous ensembles. Despite potential variations in theta phases with recording locations and depth, we find that the occurrence and amplitudes of theta oscillations are generally coordinated across hemispheres (Buzsaki et al., Neurosci., 2003). Therefore, the presence of prominent contralateral LFP theta around the times of synchronous ensembles in our study (see Figure 4A of the manuscript) strongly supports our conclusion regarding their association with theta oscillations, despite the collection of LFP from the opposite hemisphere.

      In addition, in our manuscript, we specifically mentioned that the “preferred phases” varied from session to session, likely due to the variability of recording locations (see Line 254-256). Therefore, we think that the reviewer’s concern regarding theta phase variability has already been addressed in the present manuscript.

      Regarding ripple oscillations, while we recognize that they can sometimes occur locally, the majority of ripples occur synchronously in both hemispheres (up to 70%, see Szabo et al., Neuron, 2022; Buzsaki et al., Neurosci., 2003). Therefore, using contralateral LFP to infer ripple occurrence on the ipsilateral side has been a common practice in the field, employed by many studies published in respectable journals (Szabo et al., Neuron, 2022; Terada et al., Nature, 2021; Dudok et al., Neuron, 2021; Geiller et al., Neuron, 2020). Furthermore, our observation that 446 synchronous ensembles during immobility do not co-occur with contralateral ripples, and the remaining 313 ensembles during locomotion are not associated with ripples, as ripples rarely occur during locomotion. Therefore, our conclusion that synchronous ensembles are not associated with ripple oscillations is supported by data.

      (2) The analysis of the point process data (spike trains) is entirely flawed. There are many technical issues: complex spikes ("bursts") are not accounted for; differences in spike counts between the various conditions ("locomotion" and "immobility") are not accounted for; the pooling of multiple CCGs assumes independence, whereas even conditional independence cannot be assumed; etc.

      We acknowledge the reviewer’s concern regarding spike train analysis. Indeed, complex bursts or different behavioral conditions can lead to differences in spike counts that could potentially affect the detection of synchronous ensembles. However, our jittering procedure (see Line 121-132) is designed to control for the variation of spike counts. Importantly, while the jittered spike trains also contain the same spike count variations, we found 7.8-fold more synchronous events in our data compared to jitter controls (see Figure 1G of the manuscript), indicating that these factors cannot account for the observed synchrony.

      To explicitly demonstrate that complex bursts cannot account for the observed synchrony, we have performed additional analysis to remove all latter spikes in bursts and only count the single and the first spikes of bursts. Importantly, we found that this procedure did not change the rate and size of synchronous ensembles, nor did it significantly alter the grand-average CCG (see Author response image 3). The results of this analysis explicitly rule out a significant effect of complex spikes on the analysis of synchronous ensembles.

      Author response image 3.

      Population synchrony remains after the removal of spikes in bursts. (A) The grand-average cross correlogram (CCG) was calculated using spike trains without latter spikes in bursts. The gray line represents the mean grand average CCG between reference cells and randomly selected cells from different sessions. (B) Pairwise comparison of the event rates of population synchrony between spike trains containing all spikes and spike trains without latter spikes in bursts. Bar heights indicate group means (n=10 segments, p=0.036, Wilcoxon signed-rank test). (C) Histogram of the ensemble sizes as percentages of cells participating in the synchronous ensembles.

      Beyond those methodological issues, there are two main interpretational problems: (1) the "synchronous ensembles" may be completely consistent with phase locking to the intracellular theta (as even shown by the authors themselves in some of the supplementary figures).

      We agree with the reviewer that the synchronous ensembles are indeed consistent with theta phase locking. However, it is important to note that theta phase locking alone does not necessarily imply population synchrony. In fact, theta phase locking has been shown to “reduce” population synchrony in a previous study (Mizuseki et al., 2014, Phil. Trans. R. Soc. B.). Thus, the presence of theta phase locking cannot be taken as a simple alternative explanation of the synchronous ensembles.

      To directly assess the contribution of theta phase locking to synchronous ensembles, we have performed a new analysis to randomize the specific theta cycles in which neurons spike, while keeping the spike phases constant. This manipulation disrupts spike co-occurrence while preserving theta phase locking, allowing us to test whether theta phase locking alone can explain the population synchrony, or whether spike co-occurrence in specific cycles is required. The grand-average CCG shows a much smaller peak compared to the original peak (Author response image 4A). Moreover, synchronous event rates show a 4.5-fold decrease in the randomized data compared to the original event rates (Author response image 4B). Thus, the new analysis reveals theta phase locking alone cannot account for the population synchrony.

      Author response image 4.

      Drastic reduction of population synchrony by randomizing spikes to other theta cycles while preserving the phases. (A) The grand-average cross correlogram (CCG) was calculated using original spike trains (black) and randomized spike trains where theta phases of the spikes are kept the same but spike timings were randomly moved to other theta cycles (red). (B) Pairwise comparison of the event rates of population synchrony between the original spike trains and randomized spike trains (n=10 segments, p=0.002, Wilcoxon signed-rank test). Bar heights indicate group means. ** p<0.01

      (2) The definition of "synchrony" in the present work is very loose and refers to timescales of 20-30 ms. In previous literature that relates to synchrony of point processes, the timescales discussed are 1-2 ms, and longer timescales are referred to as the "baseline" which is actually removed (using smoothing, jittering, etc.).

      Regarding the timescale of synchronous ensembles, we acknowledge that it varies considerably across studies and cell types. However, it is important to note that a timescale of dozens, or even hundreds of milliseconds is common for synchrony terminology in CA1 pyramidal neurons (see Csicsvari et al., Neuron, 2000; Harris et al., Science, 2003; Malvache et al., Science, 2016; Yagi et al., Cell Reports, 2023). In fact, a timescale of 20-30 ms is considered particularly important for information transmission and storage in CA1, as it matches the membrane time constant of pyramidal neurons, the period of hippocampal gamma oscillations, and the time window for synaptic plasticity. Therefore, we believe that this timescale is relevant and in line with established practices in the field.

    1. Author response:

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      The authors focused on medaka retinal organoids to investigate the mechanism underlying the eye cup morphogenesis. The authors succeeded to induce lens formation in fish retinal organoids using 3D suspension culture with minimal growth factor-containing media containing the Hepes. At day 1, Rx3:H2B-GFP+ cells appear in the surface region of organoids. At day 1.5, Prox1+cells appear in the interface area between the organoid surface and the core of central cell mass, which develops a spherical-shaped lens later. So, Prox1+ cells covers the surface of the internal lens cell core. At day 2, foxe3:GFP+ cells appear in the Prox1+ area, where early lens fiber marker, LFC, starts to be expressed. In addition, foxe3:GFP+ cells show EdU+ incorporation, indicating that foxe3:GFP+ cells have lens epithelial cell-characters. At day 4, cry:EGFP+ cells differentiate inside the spherical lens core, whose the surface area consists of LFC+ and Prox1+ cells. Furthermore, at day 4, the lens core moves towards the surface of retinal organoids to form an eye-cup like structure, although this morphogenesis "inside out" mechanism is different from in vivo cellular "outside -in" mechanism of eye cup formation. From these data, the authors conclude that optic cup formation, especially the positioning of the lens, is established in retinal organoids though the different mechanism of in vivo morphogenesis.

      Overall, manuscript presentation is nice. However, there are still obscure points to understand background mechanism. My comments are shown below.

      Major comments

      (1) At the initial stage of retinal organoid morphogenesis, a spherical lens is centrally positioned inside the retinal organoids, by covering a central lens core by the outer cell sheet of retinal precursor cells. I wonder if the formation of this structure may be understood by differential cell adhesive activity or mechanical tension between lens core cells and retinal cell sheet, just like the previous study done by Heisenberg lab on the spatial patterning of endoderm, mesoderm and ectoderm (Nat. Cell Biol. 10, 429 - 436 (2008)). Lens core cells may be integrated inside retinal cell mass by cell sorting through the direct interaction between retinal cells and lens cells, or between lens cells and the culture media. After day 1, it is also possible to understand that lens core moves towards the surface of retinal organoids, if adhesive/tensile force states of lens core cells may be change by secretion of extracellular matrix. I wonder if the authors measure physical property, adhesive activity and solidness, of retinal precursor cells and lens core cells. If retinal organoids at day 1 are dissociated and cultured again, do they show the same patterning of internal lens core covering by the outer retinal cell sheet?

      The question, whether different adhesive activity is involved in cell sorting and lens formation is indeed very intriguing. To address this point, we will include additional experiment (see Revision Plan, experiment 1). This experiment will be based on the dissociation and re-aggregation of lens-forming organoids as suggested by the reviewer. To monitor cell type specific sorting, we will employ a lens progenitor reporter line Foxe3::GFP and the retina-specific Rx2::H2B-RFP. If different adhesive activities of lens and retinal progenitor cells are involved and drive the process of cell sorting, dissociation and re-aggregation will result in cell sorting based on their identity. 

      (2) Optic cup is evaginated from the lateral wall of neuroepithelium of the diencephalon. In zebrafish, cell movement occurs from the pigment epithelium to the neural retina during eye morphogenesis in an FGF-dependent manner. How the medaka optic cup morphogenesis is coordinated? I also wonder if the authors conduct the tracking of cell migration during optic cup morphogenesis to reveal how cell migration and cell division are regulated in lens of the Medaka retinal organoids. It is also interesting to examine how retinal cell movement is coordinated during Medaka retinal organoids.

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. Our previous study showed that optic vesicles of medaka retinal organoids do not form optic cups (for details please see Zilova et al., 2021, eLIFE). We assume that the formation of cup-looking structure of the ocular organoids is mediated by the following processes: establishment of retina and lens domains at the specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of the centrally formed lens towards the organoid periphery through the retina layer, places the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens from the center of the organoid. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2) and follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #2).

      (3) The authors showed that blockade of FGF signaling affects lens fiber differentiation in day 1-2, whereas lens formation seems to be intact in the presence of FGF receptor inhibitor in day 0-1. I suggest the authors to examine which tissue is a target of FGF signaling in retinal organoids, using markers such as pea3, which is a downstream target of ERK branch of FGF signaling. Since FGF signaling promotes cell proliferation, is the lens core size normal in SU5402-treated organoids from day 0 to day 1?

      Assessing the activity of FGF signaling (cross-reference to Reviewer #3) in the organoids is indeed an important point. To address which tissue is the target of FGF signaling we will include additional experiments and assess the phosphorylation status of ERK (pERK) and expression of the ERK downstream target pea3, as suggested by the reviewer (see Revision Plan, experiment 3). That will allow to identify the tissue within the organoid responding to the Fgf signaling.

      Lens core size of organoids treated with SU5402 from day 0 to day 1 is fully comparable to the control (please see Figure 6b).

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure an interesting question. We are aware of this population of cells. We currently do not have data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      (5) Fig. 5e indicates the depth of Rx3 expression at day 1. Is the depth the thickness of Rx3 expressing cell sheet, which covers the central lens core in the organoids? If so, I wonder if total cell number of Rx3 expressing cell sheet may be different in each seeded-cell number, because thickness is the same across each seeded-cell number, but the surface area size may be different depending on underneath the lens core size. Please clarify this point.

      Yes. Figure 5e indicates the thickness of the cell sheet expressing Rx3 that lies on the surface of the organoid. Indeed, the number of Rx3-expressing cells (and lens cells) scales with the size of the organoid as stated in the submitted manuscript.

      (6) Noggin application inhibits lens formation at day 0-1. BMP signaling regulates formation of lens placode and olfactory placode at the early stage of development. It is interesting to examine whether Noggin-treated organoid expands olfactory placode area. Please check forebrain territory markers.

      What tissue differentiates at the expense of the lens in BMP inhibitor-treated organoids is of course an intriguing question. To address the identity of cells differentiated under this condition we will include an additional experiment (see Revision Plan, experiment 4 as suggested by the reviewer). We will check for the expression of Lhx2, Otx2 and Huc/D to address this point.

      I have no minor comments

      Referees cross-commenting

      I agree that all reviewers have similar suggestions, which are reasonable and provided the same estimated time for revision.

      Reviewer #1 (Significance):

      Strength:

      This study is unique. The authors examined eye cup morphogenesis using fish retinal organoids. Eye cup normally consists of the lens, the neural retina, pigment epithelium and optic stalk. However, retinal organoids seem to be simple and consists of two cell types, lens and retina. Interestingly, a similar optic cup-like structure is achieved in both cases; however, underlying mechanism is different. It is interesting to investigate how eye morphogenesis is regulated in retinal organoids,under the unconstrained embryo-free environment.

      Limitation:

      Description is OK, but analysis is not much profound. It is necessary to apply a bit more molecular and cellular level analysis, such as tracking of cell movement and visualization of FGF signnaling in organoid tissues.

      Advancement:

      The current study is descriptive. Need some conceptual advance, which impact cell biology field or medical science.

      Audience:

      The target audience of current study are still within ophthalmology and neuroscience community people, maybe translational/clinical rather than basic biology. To beyond specific fields, need to formulate a general principle for cell and developmental biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study from Stahl et al., the authors demonstrate that medaka pluripotent embryonic cells can self-organise into eye organoids containing both retina and lens tissues. While these organoids can self-organize into an eye structure that resembles the vertebrate eye, they are built from a fundamentally different morphogenetic process – an “inside-out” mechanism where the lens forms centrally and moves outward, rather than the normal “outside-in” embryonic process. This is a very interesting discovery, both for our understanding of developmental biology and the potential for tissue engineering applications. The study would benefit from some additional experiments and a few clarifications.

      The authors suggest that the lens cells are the ones that move from the central to a more superficial position. Is this an active movement of lens cells or just the passive consequence of the retina cells acquiring a cup shape? Are the retina cells migrating behind the lens or the lens cells pushing outwards? High-resolution imaging of organoid cup formation, tracking retina cells in combination with membrane labeling of all cells would help elucidate the morphogenetic processes occurring in the organoids. Membrane labeling would also be useful as Prox1 positive lens cells appear elongated in embryos while in the organoids, cell shapes seem less organised, less compact and not elongated (for example as shown in Fig 3f,g).

      Looking into the detail of how optic cup-looking tissue arrangement of ocular organoids is achieved on cellular level is of course interesting. We assume that the formation of cup-looking structures of the ocular organoids is mediated by following processes: establishment of retina and lens domains at a specific region of the organoid – retina on the surface and lens in the center (see Figure S2 d and Figure 3e, and Figure 4). Further dislocation of centrally formed lenses towards the organoid periphery through the retina layer, place the lens to the periphery while retinal cells stay static. We assume that the “cup-like” shape is acquired by extrusion of the lens. To clarify this process with respect to tissue rearrangements and cell movements, we will include additional experiments (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion to dissect the individual contribution of retinal/lens cells to this process (cross-reference with Reviewer #1).

      The organoids could be a useful tool to address how cell fate is linked to cell shape acquisition. In the forming organoids, retinal tissue initially forms on the outside, while non-retinal tissue is located in the centre; this central tissue later expresses lens markers. Do the authors have any insights into why fate acquisition occurs in this pattern? Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      The question how is the retinal and lens domain established in this specific manner is indeed intriguing and very interesting. We dedicated a part of the discussion to this topic. We discuss the role of the diffusion limit and the potential contribution of BMB and FGF signaling to this arrangement. Additional experiments (see Revision Plan, experiment 3) addressing the source and target tissues of FGF and BMP signaling in the organoid will ultimately bring more clarity to our understanding of the tissue arrangements in the organoid. 

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

      What happens in organoids that do not form lenses? Do these organoids still generate foxe3 positive cells that fail to develop into a proper lens structure? And in the absence of lens formation, does the retina still acquire a cup shape?

      Lens formation is primarily dependent on acquisition/specification of Foxe3-expressing lens placode progenitors. If those are not present, a lens does not develop. Once Foxe3-expressing progenitors are established, a lens is formed in unperturbed conditions (measured by the presence of expression of crystallin proteins). In such conditions, organoids that do not have a lens, do not carry Foxe3-expressing cells.

      In the absence of the lens, the organoid is composed of retinal neuroepithelium, that does not form an optic cup (for details of such phenotypes please see Zilova et al., 2021, eLIFE).

      The author suggest that lens formation occurs even in the absence of Matrigel. Is the process slower in these conditions? Are the resulting organoids smaller? While there are indeed some LFC expressing cells by day2, these cells are not very well organised and the pattern of expression seems dotty. Moreover, LFC staining seems to localise posterior to the LFC negative, lens-like structure (e.g. Fig.S1 3o’clock).

      How do these organoids develop beyond day 4? Do they maintain their structural integrity at later stages?

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      We thank the reviewer for pointing this out. We were not clear in the wording and describing of our observation. Indeed, Matrigel is not required for acquisition of lens fate, which can be demonstrated with the expression of lens-specific markers. However, the presence of Matrigel has a profound impact on the structural aspects of organoid formation. Matrigel is essential for organization of retinal-committed cells into the retinal epithelium (Zilova et al., 2021, eLIFE). The absence of the structure of the retinal epithelium can indeed negatively impact on the cellular organization and the overall lens structure. To clarify the contribution of the Matrigel to the speed of organoid lens development and to the overall structure of the organoid lens we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #3).

      The role of the HEPES in lens formation is indeed very intriguing and currently under investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes, it will require significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #3) and therefore cannot be addressed in the current manuscript.

      Referees cross-commenting

      Pleased to see that all the other reviewers are positive about the study and raise similar concerns and comments

      Reviewer #2 (Significance):

      This is a very interesting paper, and it will be important to determine whether this alternative morphogenetic process is specific to medaka or if similar developmental routes can be recapitulated in organoid cultures from other vertebrate species.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      The manuscript by Stahl and colleagues reports an approach to generate ocular organoids composed of retinal and lens structures, derived from Medaka blastula cells. The authors present a comprehensive characterisation of the timeline followed by lens and retinal progenitors, showing these have distinct origins, and that they recapitulate the expression of differentiation markers found in vivo. Despite this molecular recapitulation, morphogenesis is strikingly different, with lens progenitors arising at the centre of the organoid, and subsequently translocating to the outside.

      Comments:

      - The manuscript presents a beautiful set of high quality images showing expression of lens differentiation markers over time in the organoids. The set of experiments is very robust, with high numbers of organoids analysed and reproducible data. The mechanism by which lens specification is promoted in these organoids is, however, poorly analysed, and the reader does not get a clear understanding of what is different in these experiments, as compared to previous attempts, to support lens differentiation. There is a mention to HEPES supplementation, but no further analysis is provided, and the fact that the process is independent of ECM contradicts, as the authors point out, previous reports. The manuscript would benefit from a more detailed analysis of the mechanisms that lead to lens differentiation in this setting.

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have an impact on multiple cellular processes it will require a significant time investment to dissect molecular mechanism underlying the effect of HEPES on the process of lens formation (cross reference with Reviewer #2) and therefore unfortunately cannot be addressed in the current manuscript.

      To clarify the contribution of the Matrigel to the organoid lens development we will perform additional experiments (see Revision Plan, experiment 5). With the use of Foxe3::GFP reporter line we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel (cross-reference with Reviewer #2).

      - The markers analysed to show onset of lens differentiation in the organoids seem to start being expressed, in vivo, when the lens placode starts invaginating. An analysis of earlier stages is not presented. This would be very informative, allowing to determine whether progenitors differentiate as placode and neuroepithelium first, to subsequently continue differentiating into lens and retina, respectively. Could early placodal and anterior neural plate markers be analysed in the organoids? This would provide a more complete sequence of lens vs retina differentiation in this model.

      Yes. The figures show the expression of lens and retinal markers in the embryo in later developmental stages and the timing of their expression can be documented with higher temporal resolution. In the revised version of the manuscript, we will provide the information about the onset of expression of Rx3::H2B-GFP (retina) and Foxe3::GFP (lens) (see Author response image 1). Rx3 represents one of the earlies markers labeling the presumptive eye field within the region of the anterior neural plate (S16, late gastrula). FoxE3::GFP expression can be detected within the head surface ectoderm before the lens placode is formed showing that Foxe3 is a suitable marker of placodal progenitors in medaka.

      We are convinced that the onset of Rx3 and Foxe3-driven reporters is early enough to make the claim about the separate origin of the lens (placodal) and retinal (anterior neuroectoderm) tissues within the ocular organoids.

      Author response image 1.

      - The analysis of BMP and Fgf requirement for lens formation and differentiation is suggestive, but the source of these signals is not resolved or mentioned in the manuscript. Are BMP4 and Fgf8 expressed by the organoids? Where are they coming from?

      Indeed, addressing the source of BMP and FGF activation would bring more clarity in understanding the mechanism of retina/lens specification within the ocular organoids (cross reference with Reviewer #1). To address this point, we will include additional experiments (see Revision Plan, experiment 3). We will analyze the expression of respective ligands (Bmp4 and Fgf8) and activation of downstream effectors of BMP and FGF signaling pathways within the ocular organoids as suggested by Reviewer #1 and Reviewer #3.

      - The fact that the lens becomes specified in the centre of the organoid is striking, but it is for me difficult to visualise how it ends up being extruded from the organoid. Did the authors try to follow this process in movies? I understand that this may be technically challenging, but it would certainly help to understand the process that leads to the final organisation of retinal and lens tissues in the organoid. There is no discussion of why the morphogenetic mechanism is so different from the in vivo situation. The manuscript would benefit from explicitly discussing this.

      Following the extruding lens in vivo is indeed very relevant suggestion. To clarify the process of ocular organoid formation in the respect of tissue rearrangements and cell movements, we will include additional experiment (see Revision Plan, experiment 2). We will follow lens- and retina-fated cells (by employing lens-specific Foxe3::GFP and retina-specific Rx2::H2B-RFP reporter lines) through the process of lens extrusion (cross-reference with Reviewer #1 and Reviewer #2).

      Referees cross-commenting

      We all seem to have similar comments and concerns. I think overall the suggestions are feasible and realistic for the timeframe provided.

      Reviewer #3 (Significance):

      This study describes a reproducible approach to differentiate ocular organoids composed of lens and retinal tissues. The characterisation of lens differentiation in this model is very detailed, and despite the morphogenetic differences, the molecular mechanisms show many similarities to the in vivo situation. The manuscript however does not highlight, in my opinion, why this model may be relevant. Clearly articulating this relevance, particularly in the discussion, will enhance the study and provide more clarity to the readers regarding the significance of the study for the field of organoid research, ocular research and regenerative studies.

      Revision Plan:

      (1) To address whether differential adhesion properties of retinal and lens progenitors mediate cell sorting to establish retina and lens domains in the organoids (Reviewer #1, comment 1), we will perform dissociation of the organoids on day 1 and subsequential re-aggregation. This experiment will allow to follow cell type specific adhesion properties of lens and retinal progenitor cells. We will employ lens progenitor reporter line Foxe3::GFP and retina-specific Rx2::H2B-RFP to monitor cell type specific sorting with fluorescent microscopy.

      (2)   Multiple reviewers (Reviewer #1, Reviewer #2, Reviewer #3) asked for the presentation of detailed in vivo imaging experiment showing individual contributions of retina- and lens- fated cells to the resulting tissue organization withing the ocular organoid. We will perform in vivo live imaging experiment to follow the movements of individual lens (Foxe3::GFP) and retinal (Rx2::H2B-GFP) cells from day 1 to day 2 of organoid development to address this point.

      (3) Reviewer #1 and Reviewer #3 raised questions concerning the role of FGF and BMP signaling and sources of these signaling pathway activities in ocular organoid tissue arrangement. To address this point and bring more light into the molecular mechanisms regulating lens and retina tissue arrangement in the organoid, we will perform additional experiment. We will assess the expression of candidate FGF and BMP ligands (Fgf8, Bmp7 and Bmp4) and activation of downstream effectors (p-ERK, p-SMAD) and the direct transcriptional target of Fgf signaling (Pea3) in the developing organoids. This will allow the identification of the tissue producing the ligand on one site and tissue responding to the signaling on the other site and help out to narrow down the molecular mechanism controlling tissue arrangements in the organoid.

      (4) We will analyze the expression of forebrain territory markers in organoids treated with the BMP inhibitor to identify the identity of the tissue differentiating at the expense of lens under the BMP inhibition (suggested by Reviewer #1). We will label Noggin-treated organoids with the antibodies against Lhx2, Otx2 and HuC/D to address this point.

      (5) We will provide more comprehensive analysis of the organoids grown without the Matrigel and compare them to the organoids grown in the presence of the Matrigel (mentioned by Reviewer #2 and Reviewer #3). With the use of lens progenitor-specific Foxe3::GFP reporter line, we will measure the onset of the lens-specific gene expression. In addition, we will use the immunohistochemistry to assess the gross morphology and size of the organoids grown without the Matrigel.

      Description of analyses that authors prefer not to carry out

      Reviewer #1:

      (4) Fig. 3f and 3g indicate that there is some cell population located between foxe3:GFP+ cells and rx2:H2B-RFP+ cells. What kind of cell-type is occupied in the interface area between foxe3:GFP+ cells and rx2:H2B-RFP+ cells?

      That is for sure interesting question. We are aware of this population of cells. We currently do not have a data that would with certainty clarify the fate of those cells. We are currently following up on that question with the use of scRNA sequencing, however we will not be able to address this question in the current manuscript.

      Reviewer #2:

      The role of HEPES in promoting organoid formation is intriguing. Do the authors have any insights into why it is important in this context? Have the authors tried other culture conditions and does culture condition influence the morphogenetic pathways occurring within the organoids?

      The role of the HEPES in lens formation is indeed very intriguing and under current investigation. As HEPES is mainly used to regulate pH of the culture media and pH might have impact on multiple cellular processes it will require significant time investment to dissect molecular mechanism underlying the effect of the HEPES on the process of lens formation (cross reference with Reviewer #3) and cannot be addressed in the current manuscript.

      Is there a difference in proliferation rates between the centrally located cells and the external ones? Could it be that highly proliferative cells give rise to neural retina (NR), while lower proliferating cells become lens?

      Although analysis of the proliferation rate of the cells at the surface and in the central region of the organoid might possibly show some differences in the proliferation rates between lens and retinal cells, we do not have any indications, that the proliferation rate itself would be instructive or superior to the cell fate decisions.

    1. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for his valuable input and careful assessment, which have significantly improved the clarity and rigor of our manuscript.

      Summary:

      Mazer & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      (1) The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      (2) The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents successfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      (3) The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      (4) The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      There are a few places in the paper that can be misunderstood or don't provide complete details. Here is a selection:

      (1) Line 61: '... studies have focused on movement algorithms while overlooking the sensory challenges involved' : This statement does not match the recent state of the literature. While the previous models may have had the assumption that all neighbours can be detected, there are models that specifically study the role of limited interaction arising from a potential inability to track all neighbours due to occlusion, and the effect of responding to only one/few neighbours at a time e.g. Bode et al. 2011 R. Soc. Interface, Rosenthal et al. 2015 PNAS, Jhawar et al. 2020 Nature Physics.

      We appreciate the reviewer's comment and the relevant references. We have revised the manuscript accordingly to clarify the distinction between studies that incorporate limited interactions and those that explicitly analyze sensory constraints and interference. We have refined our statement to acknowledge these contributions while maintaining our focus on sensory challenges beyond limited neighbor detection, such as signal degradation, occlusion effects, and multimodal sensory integration (see lines 61-64):

      While collective movement has been extensively studied in various species, including insect swarming, fish schooling, and bird murmuration (Pitcher, Partridge and Wardle, 1976; Partridge, 1982; Strandburg-Peshkin et al., 2013; Pearce et al., 2014; Rosenthal, Twomey, Hartnett, Wu, Couzin, et al., 2015; Bastien and Romanczuk, 2020; Davidson et al., 2021; Aidan, Bleichman and Ayali, 2024), as well as in swarm robotics agents performing tasks such as coordinated navigation and maze-solving (Faria Dias et al., 2021; Youssefi and Rouhani, 2021; Cheraghi, Shahzad and Graffi, 2022), most studies have focused on movement algorithms , often assuming full detection of neighbors (Parrish and Edelstein-Keshet, 1999; Couzin et al., 2002, 2005; Sumpter et al., 2008; Nagy et al., 2010; Bialek et al., 2012; Gautrais et al., 2012; Attanasi et al., 2014). Some models have incorporated limited interaction rules where individuals respond to one or a few neighbors due to sensory constraints (Bode, Franks and Wood, 2011; Jhawar et al., 2020). However, fewer studies explicitly examine how sensory interference, occlusion, and noise shape decision-making in collective systems (Rosenthal et al., 2015).

      (2) The word 'interference' is used loosely places (Line 89: '...took all interference signals...', Line 319: 'spatial interference') - this is confusing as it is not clear whether the authors refer to interference in the physics/acoustics sense, or broadly speaking as a synonym for reflections and/or jamming.

      To improve clarity, we have revised the manuscript to distinguish between different types of interference:

      · Acoustic interference (jamming): Overlapping calls that completely obscure echo detection, preventing bats from perceiving necessary environmental cues.

      · Acoustic interference (masking): Partial reduction in signal clarity due to competing calls.

      · Spatial interference: Physical obstruction by conspecifics affecting movement and navigation.

      We have updated the manuscript to use these terms consistently and explicitly define them in relevant sections (see lines 87-94 and 329-330). This distinction ensures that the reader can differentiate between interference as an acoustic phenomenon and its broader implications in navigation.

      (3) The paper discusses original results without reference to how they were obtained or what was done. The lack of detail here must be considered while interpreting the Discussion e.g. Line 302 ('our model suggests...increasing the call-rate..' - no clear mention of how/where call-rate was varied) & Line 323 '..no benefit beyond a certain level..' - also no clear mention of how/where call-level was manipulated in the simulations.

      All tested parameters, including call rate dynamics and call intensity variations, are detailed in the Methods section and Tables 1 and 2. Specifically:

      · Call Rate Variation: The Inter-Pulse Interval (IPI) was modeled based on documented echolocation behavior, decreasing from 100 msec during the search phase to 35 msec (~28 calls per second) at the end of the approach phase, and to 5 msec (200 calls per second) during the final buzz (see Table 2). This natural variation in call rate was not manually manipulated in the model but emerged from the simulated bat behavior.

      · Call Intensity Variation: The tested call intensity levels (100, 110, 120, 130 dB SPL) are presented in Table 1 under the “Call Level” parameter. The effect of increasing call intensity was analyzed in relation to exit probability, jamming probability, and collision rate. This is now explicitly referenced in the Discussion.

      We have revised the manuscript to explicitly reference these aspects in the Results and Discussion sections.

      Reviewer #2 (Public review):

      We are grateful for the reviewer’s insightful feedback, which has helped us clarify key aspects of our research and strengthen our conclusions.

      This manuscript describes a detailed model of bats flying together through a fixed geometry. The model considers elements that are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in the air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively affect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      In terms of its strengths, the work relies on a thoughtful and detailed model that faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors' abstract features are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      The most notable weakness I found in this work was that some aspects of the model were not entirely clear to me.

      For example, the directionality of the bat's sonar call in relation to its velocity. Are these the same?

      For simplicity, in our model, the head is aligned with the body, therefore the direction of the echolocation beam is the same as the direction of the flight.

      Moreover, call directionality (directivity) is not directly influenced by velocity. Instead, directionality is estimated using the piston model, as described in the Methods section. The directionality is based on the emission frequency and is thus primarily linked to the behavioral phases of the bat, with frequency shifts occurring as the bat transitions from search to approach to buzz phases. During the approach phase, the bat emits calls with higher frequencies, resulting in increased directionality. This is supported by the literature (Jakobsen and Surlykke, 2010; Jakobsen, Brinkløv and Surlykke, 2013). This phase is also associated with a natural reduction in flight speed, which is a well-documented behavioral adaptation in echolocating bats (Jakobsen et al., 2024).

      To clarify this in the manuscript, we have updated the text to explicitly state that directionality follows phase-dependent frequency changes rather than being a direct function of velocity, see lines 460-465.

      If so, what is the difference between phi_target and phi_tx in the model equations?

      represents the angle between the bat and the reflected object (target).

      the angle [rad], between the masking bat and target (from the transmitter’s perspective)

      refers to the angle between the transmitting conspecific and the receiving focal bat, from the transmitter’s point of view.

      represents the angle between the receiving bat and the transmitting bat, from the receiver’s point of view.

      These definitions have been explicitly stated in the revised manuscript to prevent any ambiguity (lines 467-468). Additionally, a Supplementary figure demonstrating the geometrical relations has been added to the manuscript.

      Author response image 1.

      What is a bat's response to colliding with a conspecific (rather than a wall)?

      In nature, minor collisions between bats are common and typically do not result in significant disruptions to flight (Boerma et al., 2019; Roy et al., 2019; Goldstein et al., 2024).Given this, our model does not explicitly simulate the physical impact of a collision event. Instead, during the collision event the bat keeps decreasing its velocity and changing its flight direction until the distance between bats is above the threshold (0.4 m). We assume that the primary cost of such interactions arises from the effort required to avoid collisions, rather than from the collision itself. This assumption aligns with observations of bat behavior in dense flight environments, where individuals prioritize collision avoidance rather than modeling post-collision dynamics.

      From the statistical side, it was not clear if replicate simulations were performed. If they were, which I believe is the right way due to stochasticity in the model, how many replicates were used, and are the standard errors referred to throughout the paper between individuals in the same simulation or between independent simulations, or both?

      The number of repetitions for each scenario is detailed in Table 1, but we included it in a more prominent location in the text for clarity. Specifically, we now state (Lines 274-275):

      "The number of repetitions for each scenario was as follows: 1 bat: 240; 2 bats: 120; 5 bats: 48; 10 bats: 24; 20 bats: 12; 40 bats: 12; 100 bats: 6."

      Regarding the reported standard errors, they are calculated across all individuals within each scenario, without distinguishing between different simulation trials.

      We clarified in the revised text (Lines 534-535 in Statistical Analysis)

      Overall, I found these weaknesses to be superficial and easily remedied by the authors. The authors presented well-reasoned arguments that were supported by their results, and which were used to demonstrate how call interference impacts the collective's roost exit as measured by several variables. As the authors highlight, I think this work is valuable to individuals interested in bat biology and behavior, as well as to applications in engineered multi-agent systems like robotic swarms.

      Reviewer #3 (Public review):

      We sincerely appreciate the reviewer’s thoughtful comments and the time invested in evaluating our work, which have greatly contributed to refining our study.

      We would like to note that in general, our model often simplifies some of the bats’ abilities, under the assumption that if the simulated bats manage to perform this difficult task with simpler mechanisms, real better adapted bats will probably perform even better. This thought strategy will be repeated in several of the answers below.

      Summary:

      The authors describe a model to mimic bat echolocation behavior and flight under high-density conditions and conclude that the problem of acoustic jamming is less severe than previously thought, conflating the success of their simulations (as described in the manuscript) with hard evidence for what real bats are actually doing. The authors base their model on two species of bats that fly at "high densities" (defined by the authors as colony sizes from tens to tens of thousands of individuals and densities of up to 33.3 bats/m2), Pipistrellus kuhli and Rhinopoma microphyllum. This work fits into the broader discussion of bat sensorimotor strategies during collective flight, and simulations are important to try to understand bat behavior, especially given a lack of empirical data. However, I have major concerns about the assumptions of the parameters used for the simulation, which significantly impact both the results of the simulation and the conclusions that can be made from the data. These details are elaborated upon below, along with key recommendations the authors should consider to guide the refinement of the model.

      Strengths:

      This paper carries out a simulation of bat behavior in dense swarms as a way to explain how jamming does not pose a problem in dense groups. Simulations are important when we lack empirical data. The simulation aims to model two different species with different echolocation signals, which is very important when trying to model echolocation behavior. The analyses are fairly systematic in testing all ranges of parameters used and discussing the differential results.

      Weaknesses:

      The justification for how the different foraging phase call types were chosen for different object detection distances in the simulation is unclear. Do these distances match those recorded from empirical studies, and if so, are they identical for both species used in the simulation?

      The distances at which bats transition between echolocation phases are identical for both species in our model (see Table 2). These distances are based on well-documented empirical studies of bat hunting and obstacle avoidance behavior (Griffin, Webster and Michael, 1958; Simmons and Kick, 1983; Schnitzler et al., 1987; Kalko, 1995; Hiryu et al., 2008; Vanderelst and Peremans, 2018). These references provide extensive evidence that insectivorous bats systematically adjust their echolocation calls in response to object proximity, following the characteristic phases of search, approach, and buzz.

      To improve clarity, we have updated the text to explicitly state that the phase transition distances are empirically grounded and apply equally to both modeled species (lines 430-447).

      What reasoning do the authors have for a bat using the same call characteristics to detect a cave wall as they would for detecting a small insect?

      In echolocating bats, call parameters are primarily shaped by the target distance and echo strength. Accordingly, there is little difference in call structure between prey capture and obstacles-related maneuvers, aside from intensity adjustments based on target strength (Hagino et al., 2007; Hiryu et al., 2008; Surlykke, Ghose and Moss, 2009; Kothari et al., 2014). In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all the time, which is consistent with natural cave emergence, where they are navigating through a cluttered environment rather than engaging in open-space search. For one of the species (Rhinopoma M.), we also have empirical recordings of individuals flying under similar conditions (Goldstein et al., 2024). Our model was designed to remain as simple as possible while relying on conservative assumptions that may underestimate bat performance. If, in reality, bats fine-tune their echolocation calls even earlier or more precisely during navigation than assumed, our model would still conservatively reflect their actual capabilities.

      We actually used logarithmically frequency modulated (FM) chirps, generated using the MATLAB built-in function chirp(t, f0, t1, f1, 'logarithmic'). This method aligns with the nonlinear FM characteristics of Pipistrellus kuhlii (PK) and Rhinopoma microphyllum (RM) and provides a realistic approximation of their echolocation signals. We acknowledge that this was not sufficiently emphasized in the original text, and we have now explicitly highlighted this in the revised version to ensure clarity (sell Lines 447-449 in Methods).

      The two species modeled have different calls. In particular, the bandwidth varies by a factor of 10, meaning the species' sonars will have different spatial resolutions. Range resolution is about 10x better for PK compared to RM, but the authors appear to use the same thresholds for "correct detection" for both, which doesn't seem appropriate.

      The detection process in our model is based on Saillant’s method using a filter bank, as detailed in the paper (Saillant et al., 1993; Neretti et al., 2003; Sanderson et al., 2003). This approach inherently incorporates the advantages of a wider bandwidth, meaning that the differences in range resolution between the species are already accounted for within the signal-processing framework. Thus, there is no need to explicitly adjust the model parameters for bandwidth variations, as these effects emerge from the applied method.

      Also, the authors did not mention incorporating/correcting for/exploiting Doppler, which leads me to assume they did not model it.

      The reviewer is correct. To maintain model simplicity, we did not incorporate the Doppler effect or its impact on echolocation. The exclusion of Doppler effects was based on the assumption that while Doppler shifts can influence frequency perception, their impact on jamming and overall navigation performance is minor within the modelled context.

      The maximal Doppler shifts expected for the bats in this scenario are of ~ 1kHz. These shifts would be applied variably across signals due to the semi-random relative velocities between bats, leading to a mixed effect on frequency changes. This variability would likely result in an overall reduction in jamming rather than exacerbating it, aligning with our previous statement that our model may overestimate the severity of acoustic interference. Such Doppler shifts would result in errors of 2-4 cm in localization (i.e., 200-400 micro-seconds) (Boonman, Parsons and Jones, 2003). 

      We have now explicitly highlighted this in the revised version (see Lines 468-470).

      The success of the simulation may very well be due to variation in the calls of the bats, which ironically enough demonstrates the importance of a jamming avoidance response in dense flight. This explains why the performance of the simulation falls when bats are not able to distinguish their own echoes from other signals. For example, in Figure C2, there are calls that are labeled as conspecific calls and have markedly shorter durations and wider bandwidths than others. These three phases for call types used by the authors may be responsible for some (or most) of the performance of the model since the correlation between different call types is unlikely to exceed the detection threshold. But it turns out this variation in and of itself is what a jamming avoidance response may consist of. So, in essence, the authors are incorporating a jamming avoidance response into their simulation.

      We fully agree that the natural variations in call design between the phases contribute significantly to interference reduction (see our discussion in a previous paper in Mazar & Yovel, 2020). However, we emphasize that this cannot be classified as a Jamming Avoidance Response (JAR). In our model, bats respond only to the physical presence of objects and not to the acoustic environment or interference itself. There is no active or adaptive adjustment of call design to minimize jamming beyond the natural phase-dependent variations in call structure. Therefore, while variation in call types does inherently reduce interference, this effect emerges passively from the modeled behavior rather than as an intentional strategy to avoid jamming.

      The authors claim that integration over multiple pings (though I was not able to determine the specifics of this integration algorithm) reduces the masking problem. Indeed, it should: if you have two chances at detection, you've effectively increased your SNR by 3dB.

      The reviewer is correct. Indeed, integration over multiple calls improves signal-to-noise ratio (SNR), effectively increasing it by approximately 3 dB per doubling of observations. The specifics of the integration algorithm are detailed in the Methods section, where we describe how sensory information is aggregated across multiple time steps to enhance detection reliability.

      They also claim - although it is almost an afterthought - that integration dramatically reduces the degradation caused by false echoes. This also makes sense: from one ping to the next, the bat's own echo delays will correlate extremely well with the bat's flight path. Echo delays due to conspecifics will jump around kind of randomly. However, the main concern is regarding the time interval and number of pings of the integration, especially in the context of the bat's flight speed. The authors say that a 1s integration interval (5-10 pings) dramatically reduces jamming probability and echo confusion. This number of pings isn't very high, and it occurs over a time interval during which the bat has moved 5-10m. This distance is large compared to the 0.4m distance-to-obstacle that triggers an evasive maneuver from the bat, so integration should produce a latency in navigation that significantly hinders the ability to avoid obstacles. Can the authors provide statistics that describe this latency, and discussion about why it doesn't seem to be a problem?

      As described in the Methods section, the bat’s collision avoidance response does not solely rely on the integration process. Instead, the model incorporates real-time echoes from the last calls, which are used independently of the integration process for immediate obstacle avoidance maneuvers. This ensures that bats can react to nearby obstacles without being hindered by the integration latency. The slower integration on the other hand is used for clustering, outlier removal and estimation wall directions to support the pathfinding process, as illustrated in Supplementary Figure 1.

      Additionally, our model assumes that bats store the physical positions of echoes in an allocentric coordinate system (x-y). The integration occurs after transforming these detections from a local relative reference frame to a global spatial representation. This allows for stable environmental mapping while maintaining responsiveness to immediate changes in the bat’s surroundings.

      See lines 518-523 in the revied version.

      The authors are using a 2D simulation, but this very much simplifies the challenge of a 3D navigation task, and there is an explanation as to why this is appropriate. Bat densities and bat behavior are discussed per unit area when realistically it should be per unit volume. In fact, the authors reference studies to justify the densities used in the simulation, but these studies were done in a 3D world. If the authors have justification for why it is realistic to model a 3D world in a 2D simulation, I encourage them to provide references justifying this approach.

      We acknowledge that this is a simplification; however, from an echolocation perspective, a 2D framework represents a worst-case scenario in terms of bat densities and maneuverability:

      · Higher Effective Density: A 2D model forces all bats into a single plane rather than distributing them through a 3D volume, increasing the likelihood of overlap in calls and echoes and making jamming more severe. As described in the text: the average distance to the nearest bat in our simulation is 0.27m (with 100 bats), whereas reported distances in very dense colonies are 0.5m, as observed in Myotis grisescens and Tadarida brasiliensis (Fujioka et al., 2021; Sabol and Hudson, 1995; Betke et al., 2008; Gillam et al, 2010)

      · Reduced Maneuverability: In 3D space, bats can use vertical movement to avoid obstacles and conspecifics. A 2D constraint eliminates this degree of freedom, increasing collision risk and limiting escape options.

      Thus, our 2D model provides a conservative difficult test case, ensuring that our findings are valid under conditions where jamming and collision risks are maximized. Additionally, the 2D framework is computationally efficient, allowing us to perform multiple simulation runs to explore a broad parameter space and systematically test the impact of different variables.

      To address the reviewer’s concern, we have clarified this justification in the revised text and will provide supporting references where applicable: (see Methods lines 407-412)

      The focus on "masking" (which appears to be just in-band noise), especially relative to the problem of misassigned echoes, is concerning. If the bat calls are all the same waveform (downsweep linear FM of some duration, I assume - it's not clear from the text), false echoes would be a major problem. Masking, as the authors define it, just reduces SNR. This reduction is something like sqrt(N), where N is the number of conspecifics whose echoes are audible to the bat, so this allows the detection threshold to be set lower, increasing the probability that a bat's echo will exceed a detection threshold. False echoes present a very different problem. They do not reduce SNR per se, but rather they cause spurious threshold excursions (N of them!) that the bat cannot help but interpret as obstacle detection. I would argue that in dense groups the mis-assignment problem is much more important than the SNR problem.

      There is substantial literature supporting the assumption that bats can recognize their own echoes and distinguish them from conspecific signals (Schnitzler and Bioscience, 2001‏; Kazial, Burnett and Masters, 2001; Burnett and Masters, 2002; Kazial, Kenny and Burnett, 2008; Chili, Xian and Moss, 2009; Yovel et al., 2009; Beetz and Hechavarría, 2022). However, we acknowledge that false echoes may present a major challenge in dense groups. To address this, we explicitly tested the impact of the self-echo identification assumption in our study see Results Figure 4: The impact of confusion on performance, and lines 345-355 in the Discussion.

      Furthermore, we examined a full confusion scenario, where all reflected echoes from conspecifics were misinterpreted as obstacle reflections (i.e., 100% confusion). Our results show that this significantly degrades navigation performance, supporting the argument that echo misassignment is a critical issue. However, we also explored a simple mitigation strategy based on temporal integration with outlier rejection, which provided some improvement in performance. This suggests that real bats may possess additional mechanisms to enhance self-echo identification and reduce false detections. See lines XX in the manuscript for further discussion.

      The criteria set for flight behavior (lines 393-406) are not justified with any empirical evidence of the flight behavior of wild bats in collective flight. How did the authors determine the avoidance distances? Also, what is the justification for the time limit of 15 seconds to emerge from the opening? Instead of an exit probability, why not instead use a time criterion, similar to "How long does it take X% of bats to exit?"

      While we acknowledge that wild bats may employ more complex behaviors for collision avoidance, we chose to implement a simplified decision-making rule in our model to maintain computational tractability.

      The avoidance distances (1.5 m from walls and 0.4 m from other bats) were selected as internal parameters to ensure coherent flight trajectories while maintaining a reasonable collision rate. These distances provide a balance between maneuverability and stability, preventing erratic flight patterns while still enabling effective obstacle avoidance. In the revised paper, we have added supplementary figures illustrating the effect of model parameters on performance, specifically focusing on the avoidance distance.

      The 15-second exit limit was determined as described in the text (Lines 403-404): “A 15-second window was chosen because it is approximately twice the average exit time for 40 bats and allows for a second corrective maneuver if needed.” In other words, it allowed each bat to circle the ‘cave’ twice to exit even in the most crowded environment. This threshold was set to keep simulation time reasonable while allowing sufficient time for most bats to exit successfully.

      We acknowledge that the alternative approach suggested by the reviewer—measuring the time taken for a certain percentage of bats to exit—is also valid. However, in our model, some outlier bats fail to exit and continue flying for many minutes, Such simulations would lead to excessive simulation times making it difficult to generate repetitions and not teaching us much – they usually resulted from the bat slightly missing the opening (see video S1. Our chosen approach ensures practical runtime constraints while still capturing relevant performance metrics.

      What is the empirical justification for the 1-10 calls used for integration?

      The "average exit time for 40 bats" is also confusing and not well explained. Was this determined empirically? From the simulation? If the latter, what are the conditions? Does it include masking, no masking, or which species?

      Previous studies have demonstrated that bats integrate acoustic information received sequentially over several echolocation calls (2-15), effectively constructing an auditory scene in complex environments (Ulanovsky and Moss, 2008; Chili, Xian and Moss, 2009; Moss and Surlykke, 2010; Yovel and Ulanovsky, 2017; Salles, Diebold and Moss, 2020). Additionally, bats are known to produce echolocation sound groups when spatiotemporal localization demands are high (Kothari et al., 2014). Studies have documented call sequences ranging from 2 to 15 grouped calls (Moss et al., 2010), and it has been hypothesized that grouping facilitates echo segregation.

      We did not use a single integration window - we tested integration sizes between 1 and 10 calls and presented the results in Figure 3A. This range was chosen based on prior empirical findings and to explore how different levels of temporal aggregation impact navigation performance. Indeed, the results showed that the performance levels between 5-10 calls integration window (Figure 3A)

      Regarding the average exit time for 40 bats, this value was determined from our simulations, where it represents the mean time for successful exits under standard conditions with masking.

      We have revised the text to clarify these details see, lines 466.

      References:

      Aidan, Y., Bleichman, I. and Ayali, A. (2024) ‘Pausing to swarm: locust intermittent motion is instrumental for swarming-related visual processing’, Biology letters, 20(2), p. 20230468. Available at: https://doi.org/10.1098/rsbl.2023.0468.

      Attanasi, A. et al. (2014) ‘Collective Behaviour without Collective Order in Wild Swarms of Midges’. Edited by T. Vicsek, 10(7). Available at: https://doi.org/10.1371/journal.pcbi.1003697.

      Bastien, R. and Romanczuk, P. (2020) ‘A model of collective behavior based purely on vision’, Science Advances, 6(6). Available at: https://doi.org/10.1126/sciadv.aay0792.

      Beetz, M.J. and Hechavarría, J.C. (2022) ‘Neural Processing of Naturalistic Echolocation Signals in Bats’, Frontiers in Neural Circuits, 16, p. 899370. Available at: https://doi.org/10.3389/FNCIR.2022.899370/BIBTEX.

      Betke, M. et al. (2008) ‘Thermal Imaging Reveals Significantly Smaller Brazilian Free-Tailed Bat Colonies Than Previously Estimated’, Journal of Mammalogy, 89(1), pp. 18–24. Available at: https://doi.org/10.1644/07-MAMM-A-011.1.

      Bialek, W. et al. (2012) ‘Statistical mechanics for natural flocks of birds’, Proceedings of the National Academy of Sciences, 109(13), pp. 4786–4791. Available at: https://doi.org/10.1073/PNAS.1118633109.

      Bode, N.W.F., Franks, D.W. and Wood, A.J. (2011) ‘Limited interactions in flocks: Relating model simulations to empirical data’, Journal of the Royal Society Interface, 8(55), pp. 301–304. Available at: https://doi.org/10.1098/RSIF.2010.0397.

      Boerma, D.B. et al. (2019) ‘Wings as inertial appendages: How bats recover from aerial stumbles’, Journal of Experimental Biology, 222(20). Available at: https://doi.org/10.1242/JEB.204255/VIDEO-3.

      Boonman, A.M., Parsons, S. and Jones, G. (2003) ‘The influence of flight speed on the ranging performance of bats using frequency modulated echolocation pulses’, The Journal of the Acoustical Society of America, 113(1), p. 617. Available at: https://doi.org/10.1121/1.1528175.

      Burnett, S.C. and Masters, W.M. (2002) ‘Identifying Bats Using Computerized Analysis and Artificial Neural Networks’, North American Symposium on Bat Research, 9.

      Cheraghi, A.R., Shahzad, S. and Graffi, K. (2022) ‘Past, Present, and Future of Swarm Robotics’, in Lecture Notes in Networks and Systems. Available at: https://doi.org/10.1007/978-3-030-82199-9_13.

      Chili, C., Xian, W. and Moss, C.F. (2009) ‘Adaptive echolocation behavior in bats for the analysis of auditory scenes’, Journal of Experimental Biology, 212(9), pp. 1392–1404. Available at: https://doi.org/10.1242/jeb.027045.

      Couzin, I.D. et al. (2002) ‘Collective Memory and Spatial Sorting in Animal Groups’, Journal of Theoretical Biology, 218(1), pp. 1–11. Available at: https://doi.org/10.1006/jtbi.2002.3065.

      Couzin, I.D. et al. (2005) ‘Effective leadership and decision-making in animal groups on the move’, Nature, 433(7025), pp. 513–516. Available at: https://doi.org/10.1038/nature03236.

      Davidson, J.D. et al. (2021) ‘Collective detection based on visual information in animal groups’, Journal of the Royal Society, 18(180), p. 2021.02.18.431380. Available at: https://doi.org/10.1098/rsif.2021.0142.

      Faria Dias, P.G. et al. (2021) ‘Swarm robotics: A perspective on the latest reviewed concepts and applications’, Sensors. Available at: https://doi.org/10.3390/s21062062.

      Fujioka, E. et al. (2021) ‘Three-Dimensional Trajectory Construction and Observation of Group Behavior of Wild Bats During Cave Emergence’, Journal of Robotics and Mechatronics, 33(3), pp. 556–563. Available at: https://doi.org/10.20965/jrm.2021.p0556.

      Gautrais, J. et al. (2012) ‘Deciphering Interactions in Moving Animal Groups’, PLOS Computational Biology, 8(9), p. e1002678. Available at: https://doi.org/10.1371/JOURNAL.PCBI.1002678.

      Gillam, E.H. et al. (2010) ‘Echolocation behavior of Brazilian free-tailed bats during dense emergence flights’, Journal of Mammalogy, 91(4), pp. 967–975. Available at: https://doi.org/10.1644/09-MAMM-A-302.1.

      Goldstein, A. et al. (2024) ‘Collective Sensing – On-Board Recordings Reveal How Bats Maneuver Under Severe 4 Acoustic Interference’, Under Review, pp. 1–25.

      Griffin, D.R., Webster, F.A. and Michael, C.R. (1958) ‘THE ECHOLOCATION OF FLYING INSECTS BY BATS ANIMAL BEHAVIOUR , Viii , 3-4’.

      Hagino, T. et al. (2007) ‘Adaptive SONAR sounds by echolocating bats’, International Symposium on Underwater Technology, UT 2007 - International Workshop on Scientific Use of Submarine Cables and Related Technologies 2007, pp. 647–651. Available at: https://doi.org/10.1109/UT.2007.370829.

      Hiryu, S. et al. (2008) ‘Adaptive echolocation sounds of insectivorous bats, Pipistrellus abramus, during foraging flights in the field’, The Journal of the Acoustical Society of America, 124(2), pp. EL51–EL56. Available at: https://doi.org/10.1121/1.2947629.

      Jakobsen, L. et al. (2024) ‘Velocity as an overlooked driver in the echolocation behavior of aerial hawking vespertilionid bats’. Available at: https://doi.org/10.1016/j.cub.2024.12.042.

      Jakobsen, L., Brinkløv, S. and Surlykke, A. (2013) ‘Intensity and directionality of bat echolocation signals’, Frontiers in Physiology, 4 APR(April), pp. 1–9. Available at: https://doi.org/10.3389/fphys.2013.00089.

      Jakobsen, L. and Surlykke, A. (2010) ‘Vespertilionid bats control the width of their biosonar sound beam dynamically during prey pursuit’, 107(31). Available at: https://doi.org/10.1073/pnas.1006630107.

      Jhawar, J. et al. (2020) ‘Noise-induced schooling of fish’, Nature Physics 2020 16:4, 16(4), pp. 488–493. Available at: https://doi.org/10.1038/s41567-020-0787-y.

      Kalko, E.K. V. (1995) ‘Insect pursuit, prey capture and echolocation in pipistrelle bats (Microchirptera)’, Animal Behaviour, 50(4), pp. 861–880.

      Kazial, K.A., Burnett, S.C. and Masters, W.M. (2001) ‘ Individual and Group Variation in Echolocation Calls of Big Brown Bats, Eptesicus Fuscus (Chiroptera: Vespertilionidae) ’, Journal of Mammalogy, 82(2), pp. 339–351. Available at: https://doi.org/10.1644/1545-1542(2001)082<0339:iagvie>2.0.co;2.

      Kazial, K.A., Kenny, T.L. and Burnett, S.C. (2008) ‘Little brown bats (Myotis lucifugus) recognize individual identity of conspecifics using sonar calls’, Ethology, 114(5), pp. 469–478. Available at: https://doi.org/10.1111/j.1439-0310.2008.01483.x.

      Kothari, N.B. et al. (2014) ‘Timing matters: Sonar call groups facilitate target localization in bats’, Frontiers in Physiology, 5 MAY. Available at: https://doi.org/10.3389/fphys.2014.00168.

      Moss, C.F. and Surlykke, A. (2010) ‘Probing the natural scene by echolocation in bats’, Frontiers in Behavioral Neuroscience. Available at: https://doi.org/10.3389/fnbeh.2010.00033.

      Nagy, M. et al. (2010) ‘Hierarchical group dynamics in pigeon flocks’, Nature 2010 464:7290, 464(7290), pp. 890–893. Available at: https://doi.org/10.1038/nature08891.

      Neretti, N. et al. (2003) ‘Time-frequency model for echo-delay resolution in wideband biosonar’, The Journal of the Acoustical Society of America, 113(4), pp. 2137–2145. Available at: https://doi.org/10.1121/1.1554693.

      Parrish, J.K. and Edelstein-Keshet, L. (1999) ‘Complexity, Pattern, and Evolutionary Trade-Offs in Animal Aggregation’, Science, 284(5411), pp. 99–101. Available at: https://doi.org/10.1126/SCIENCE.284.5411.99.

      Partridge, B.L. (1982) ‘The Structure and Function of Fish Schools’, 246(6), pp. 114–123. Available at: https://doi.org/10.2307/24966618.

      Pearce, D.J.G. et al. (2014) ‘Role of projection in the control of bird flocks’, Proceedings of the National Academy of Sciences of the United States of America, 111(29), pp. 10422–10426. Available at: https://doi.org/10.1073/pnas.1402202111.

      Pitcher, T.J., Partridge, B.L. and Wardle, C.S. (1976) ‘A blind fish can school’, Science, 194(4268), pp. 963–965. Available at: https://doi.org/10.1126/science.982056.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S., Couzin, I.D., et al. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/pnas.1420068112.

      Rosenthal, S.B., Twomey, C.R., Hartnett, A.T., Wu, H.S. and Couzin, I.D. (2015) ‘Revealing the hidden networks of interaction in mobile animal groups allows prediction of complex behavioral contagion’, Proceedings of the National Academy of Sciences of the United States of America, 112(15), pp. 4690–4695. Available at: https://doi.org/10.1073/PNAS.1420068112/-/DCSUPPLEMENTAL/PNAS.1420068112.SAPP.PDF.

      Roy, S. et al. (2019) ‘Extracting interactions between flying bat pairs using model-free methods’, Entropy, 21(1). Available at: https://doi.org/10.3390/e21010042.

      Sabol, B.M. and Hudson, M.K. (1995) ‘Technique using thermal infrared-imaging for estimating populations of gray bats’, Journal of Mammalogy, 76(4). Available at: https://doi.org/10.2307/1382618.

      Saillant, P.A. et al. (1993) ‘A computational model of echo processing and acoustic imaging in frequency- modulated echolocating bats: The spectrogram correlation and transformation receiver’, The Journal of the Acoustical Society of America, 94(5). Available at: https://doi.org/10.1121/1.407353.

      Salles, A., Diebold, C.A. and Moss, C.F. (2020) ‘Echolocating bats accumulate information from acoustic snapshots to predict auditory object motion’, Proceedings of the National Academy of Sciences of the United States of America, 117(46), pp. 29229–29238. Available at: https://doi.org/10.1073/PNAS.2011719117/SUPPL_FILE/PNAS.2011719117.SAPP.PDF.

      Sanderson, M.I. et al. (2003) ‘Evaluation of an auditory model for echo delay accuracy in wideband biosonar’, The Journal of the Acoustical Society of America, 114(3), pp. 1648–1659. Available at: https://doi.org/10.1121/1.1598195.

      Schnitzler, H., Bioscience, E.K.- and 2001‏, undefined (no date) ‘Echolocation by insect-eating bats: we define four distinct functional groups of bats and find differences in signal structure that correlate with the typical echolocation ‏’, academic.oup.com‏HU Schnitzler, EKV Kalko‏Bioscience, 2001‏•academic.oup.com‏ [Preprint]. Available at: https://academic.oup.com/bioscience/article-abstract/51/7/557/268230 (Accessed: 17 March 2025).

      Schnitzler, H.-U. et al. (1987) ‘The echolocation and hunting behavior of the bat,Pipistrellus kuhli’, Journal of Comparative Physiology A, 161(2), pp. 267–274. Available at: https://doi.org/10.1007/BF00615246.

      Simmons, J.A. and Kick, S.A. (1983) ‘Interception of Flying Insects by Bats’, Neuroethology and Behavioral Physiology, pp. 267–279. Available at: https://doi.org/10.1007/978-3-642-69271-0_20.

      Strandburg-Peshkin, A. et al. (2013) ‘Visual sensory networks and effective information transfer in animal groups’, Current Biology. Cell Press. Available at: https://doi.org/10.1016/j.cub.2013.07.059.

      Sumpter, D.J.T. et al. (2008) ‘Consensus Decision Making by Fish’, Current Biology, 18(22), pp. 1773–1777. Available at: https://doi.org/10.1016/J.CUB.2008.09.064.

      Surlykke, A., Ghose, K. and Moss, C.F. (2009) ‘Acoustic scanning of natural scenes by echolocation in the big brown bat, Eptesicus fuscus’, Journal of Experimental Biology, 212(7), pp. 1011–1020. Available at: https://doi.org/10.1242/JEB.024620.

      Theriault, D.H. et al. (no date) ‘Reconstruction and analysis of 3D trajectories of Brazilian free-tailed bats in flight‏’, cs-web.bu.edu‏ [Preprint]. Available at: https://cs-web.bu.edu/faculty/betke/papers/2010-027-3d-bat-trajectories.pdf (Accessed: 4 May 2023).

      Ulanovsky, N. and Moss, C.F. (2008) ‘What the bat’s voice tells the bat’s brain’, Proceedings of the National Academy of Sciences of the United States of America, 105(25), pp. 8491–8498. Available at: https://doi.org/10.1073/pnas.0703550105.

      Vanderelst, D. and Peremans, H. (2018) ‘Modeling bat prey capture in echolocating bats : The feasibility of reactive pursuit’, Journal of theoretical biology, 456, pp. 305–314.

      Youssefi, K.A.R. and Rouhani, M. (2021) ‘Swarm intelligence based robotic search in unknown maze-like environments’, Expert Systems with Applications, 178. Available at: https://doi.org/10.1016/j.eswa.2021.114907.

      Yovel, Y. et al. (2009) ‘The voice of bats: How greater mouse-eared bats recognize individuals based on their echolocation calls’, PLoS Computational Biology, 5(6). Available at: https://doi.org/10.1371/journal.pcbi.1000400.

      Yovel, Y. and Ulanovsky, N. (2017) ‘Bat Navigation’, The Curated Reference Collection in Neuroscience and Biobehavioral Psychology, pp. 333–345. Available at: https://doi.org/10.1016/B978-0-12-809324-5.21031-6.

    1. Author Response

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Given knowledge of the amino acid sequence and of some version of the 3D structure of two monomers that are expected to form a complex, the authors investigate whether it is possible to accurately predict which residues will be in contact in the 3D structure of the expected complex. To this effect, they train a deep learning model that takes as inputs the geometric structures of the individual monomers, per-residue features (PSSMs) extracted from MSAs for each monomer, and rich representations of the amino acid sequences computed with the pre-trained protein language models ESM-1b, MSA Transformer, and ESM-IF. Predicting inter-protein contacts in complexes is an important problem. Multimer variants of AlphaFold, such as AlphaFold-Multimer, are the current state of the art for full protein complex structure prediction, and if the three-dimensional structure of a complex can be accurately predicted then the inter-protein contacts can also be accurately determined. By contrast, the method presented here seeks state-of-the-art performance among models that have been trained end-to-end for inter-protein contact prediction.

      Strengths:

      The paper is carefully written and the method is very well detailed. The model works both for homodimers and heterodimers. The ablation studies convincingly demonstrate that the chosen model architecture is appropriate for the task. Various comparisons suggest that PLMGraph-Inter performs substantially better, given the same input than DeepHomo, GLINTER, CDPred, DeepHomo2, and DRN-1D2D_Inter. As a byproduct of the analysis, a potentially useful heuristic criterion for acceptable contact prediction quality is found by the authors: namely, to have at least 50% precision in the prediction of the top 50 contacts.

      We thank the reviewer for recognizing the strengths of our work!

      Weaknesses:

      My biggest issue with this work is the evaluations made using bound monomer structures as inputs, coming from the very complexes to be predicted. Conformational changes in protein-protein association are the key element of the binding mechanism and are challenging to predict. While the GLINTER paper (Xie & Xu, 2022) is guilty of the same sin, the authors of CDPred (Guo et al., 2022) correctly only report test results obtained using predicted unbound tertiary structures as inputs to their model. Test results using experimental monomer structures in bound states can hide important limitations in the model, and thus say very little about the realistic use cases in which only the unbound structures (experimental or predicted) are available. I therefore strongly suggest reducing the importance given to the results obtained using bound structures and emphasizing instead those obtained using predicted monomer structures as inputs.

      We thank the reviewer for the suggestion! We evaluated PLMGraph-Inter with the predicted monomers and analyzed the result in details (see the “Impact of the monomeric structure quality on contact prediction” section and Figure 3). To mimic the real cases, we even deliberately reduced the performance of AF2 by using reduced MSAs (see the 2nd paragraph in the ““Impact of the monomeric structure quality on contact prediction” section). We leave some of the results in the supplementary of the current manuscript (Table S2). We will move these results to the main text to emphasize the performance of PLMGraph-Inter with the predicted monomers in the revision.

      In particular, the most relevant comparison with AlphaFold-Multimer (AFM) is given in Figure S2, not Figure 6. Unfortunately, it substantially shrinks the proportion of structures for which AFM fails while PLMGraph-Inter performs decently. Still, it would be interesting to investigate why this occurs. One possibility would be that the predicted monomer structures are of bad quality there, and PLMGraph-Inter may be able to rely on a signal from its language model features instead. Finally, AFM multimer confidence values ("iptm + ptm") should be provided, especially in the cases in which AFM struggles.

      We thank the reviewer for the suggestion! Yes! The performance of PLMGraph-Inter drops when the predicted monomers are used in the prediction. However, it is difficult to say which is a fairer comparison, Figure 6 or Figure S2, since AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native templates. We will provide the AFM confidence values of the AFM predictions in the revision.

      Besides, in cases where any experimental structures - bound or unbound - are available and given to PLMGraph-Inter as inputs, they should also be provided to AlphaFold-Multimer (AFM) as templates. Withholding these from AFM only makes the comparison artificially unfair. Hence, a new test should be run using AFM templates, and a new version of Figure 6 should be produced. Additionally, AFM's mean precision, at least for top-50 contact prediction, should be reported so it can be compared with PLMGraph-Inter's.

      We thank the reviewers for the suggestion! We would like to notify that AFM also searched monomer templates (see the third paragraph in 7. Supplementary Information : 7.1 Data in the AlphaFold-Multimer preprint: https://www.biorxiv.org/content/10.1101/2021.10.04.463034v2.full) in the prediction. When we checked our AFM runs, we found that 99% of the targets in our study (including all the targets in the four datasets: HomoPDB, HeteroPDB, DHTest and DB5.5) employed at least 20 templates in their predictions, and 87.8% of the targets employed the native template.

      It's a shame that many of the structures used in the comparison with AFM are actually in the AFM v2 training set. If there are any outside the AFM v2 training set and, ideally, not sequence- or structure-homologous to anything in the AFM v2 training set, they should be discussed and reported on separately. In addition, why not test on structures from the "Benchmark 2" or "Recent-PDB-Multimers" datasets used in the AFM paper?

      We thank the reviewer for the suggestion! The biggest challenge to objectively evaluate AFM is that as far as we known, AFM does not release the PDB ids of its training set and the “Recent-PDB-Multimers” dataset. “Benchmark 2” only includes 17 heterodimer proteins, and the number can be further decreased after removing targets redundant to our training set. We think it is difficult to draw conclusions from such a small number of targets. In the revision, we will analyze the performance of AFM on targets released after the date cutoff of the AFM training set, but with which we cannot totally remove the redundancy between the training and the test sets of AFM.

      It is also worth noting that the AFM v2 weights have now been outdated for a while, and better v3 weights now exist, with a training cutoff of 2021-09-30.

      We thank the reviewer for reminding the new version of AFM. The only difference between AFM V3 and V2 is the cutoff date of the training set. Our test set would have more overlaps with the training set of AFM V3, which is one reason that we think AFM V2 is more appropriate to be used in the comparison.

      Another weakness in the evaluation framework: because PLMGraph-Inter uses structural inputs, it is not sufficient to make its test set non-redundant in sequence to its training set. It must also be non-redundant in structure. The Benchmark 2 dataset mentioned above is an example of a test set constructed by removing structures with homologous templates in the AF2 training set. Something similar should be done here.

      We agree with the reviewer that testing whether the model can keep its performance on targets with no templates (i.e. non-redundant in structure) is important. We will perform the analysis in the revision.

      Finally, the performance of DRN-1D2D for top-50 precision reported in Table 1 suggests to me that, in an ablation study, language model features alone would yield better performance than geometric features alone. So, I am puzzled why model "a" in the ablation is a "geometry-only" model and not a "LM-only" one.

      Using the protein geometric graph to integrate multiple protein language models is the main idea of PLMGraph-Inter. Comparing with our previous work (DRN-1D2D_Inter), we consider the building of the geometric graph as one major contribution of this work. To emphasize the efficacy of this geometric graph, we chose to use the “geometry-only” model as the base model. We will further clarity this in the revision.

      Reviewer #2 (Public Review):

      This work introduces PLMGraph-Inter, a new deep-learning approach for predicting inter-protein contacts, which is crucial for understanding protein-protein interactions. Despite advancements in this field, especially driven by AlphaFold, prediction accuracy and efficiency in terms of computational cost) still remains an area for improvement. PLMGraph-Inter utilizes invariant geometric graphs to integrate the features from multiple protein language models into the structural information of each subunit. When compared against other inter-protein contact prediction methods, PLMGraph-Inter shows better performance which indicates that utilizing both sequence embeddings and structural embeddings is important to achieve high-accuracy predictions with relatively smaller computational costs for the model training.

      The conclusions of this paper are mostly well supported by data, but test examples should be revisited with a more strict sequence identity cutoff to avoid any potential information leakage from the training data. The main figures should be improved to make them easier to understand.

      We thank the reviewer for recognizing the significance of our work! We will revise the manuscript carefully to address the reviewer’s concerns.

      1. The sequence identity cutoff to remove redundancies between training and test set was set to 40%, which is a bit high to remove test examples having homology to training examples. For example, CDPred uses a sequence identity cutoff of 30% to strictly remove redundancies between training and test set examples. To make their results more solid, the authors should have curated test examples with lower sequence identity cutoffs, or have provided the performance changes against sequence identities to the closest training examples.

      We thank the reviewer for the valuable suggestion! Using different thresholds to reduce the redundancy between the test set and the training set is a very good suggestion, and we will perform the analysis in the revision. In the current version of the manuscript, the 40% sequence identity is used as the cutoff for many previous studies used this cutoff (e.g. the Recent-PDB-Multimers used in AlphaFold-Multimer (see: 7.8 Datasets in the AlphaFold-Multimer paper); the work of DSCRIPT: https://www.cell.com/action/showPdf?pii=S2405-4712%2821%2900333-1 (see: the PPI dataset paragraph in the METHODS DETAILS section of the STAR METHODS)). One reason for using the relatively higher threshold for PPI studies is that PPIs are generally not as conserved as protein monomers.

      We performed a preliminary analysis using different thresholds to remove redundancy when preparing this provisional response letter:

      Author response table 1.

      Table1. The performance of PLMGraph-Inter on the HomoPDB and HeteroPDB test sets using native structures(AlphaFold2 predicted structures).

      Method:

      To remove redundancy, we clustered 11096 sequences from the training set and test sets (HomoPDB, HeteroPDB) using MMSeq2 with different sequence identity threshold (40%, 30%, 20%, 10%) (the lowest cutoff for CD-HIT is 40%, so we switched to MMSeq2). Each sequence is then uniquely labeled by the cluster (e.g. cluster 0, cluster 1, …) to which it belongs, from which each PPI can be marked with a pair of clusters (e.g. cluster 0-cluster 1). The PPIs belonging to the same cluster pair (note: cluster n - cluster m and cluster n-cluster m were considered as the same pair) were considered as redundant. For each PPI in the test set, if the pair cluster it belongs to contains the PPI belonging to the training set, we remove that PPI from the test set.

      We will perform more detailed analyses in the revised manuscript.

      1. Figures with head-to-head comparison scatter plots are hard to understand as scatter plots because too many different methods are abstracted into a single plot with multiple colors. It would be better to provide individual head-to-head scatter plots as supplementary figures, not in the main figure.

      We thank the reviewer for the suggestion! We will include the individual head-to-head scatter plots as supplementary figures in the revision.

      3) The authors claim that PLMGraph-Inter is complementary to AlphaFold-multimer as it shows better precision for the cases where AlphaFold-multimer fails. To strengthen the point, the qualities of predicted complex structures via protein-protein docking with predicted contacts as restraints should have been compared to those of AlphaFold-multimer structures.

      We thank the reviewer for the suggestion! We will add this comparison in the revision.

      4) It would be interesting to further analyze whether there is a difference in prediction performance depending on the depth of multiple sequence alignment or the type of complex (antigen-antibody, enzyme-substrates, single species PPI, multiple species PPI, etc).

      We thank the reviewer for the suggestion! We will perform such analysis in the revision.

    1. Author response:

      eLife Assessment 

      This valuable study investigates how the neural representation of individual finger movements changes during the early period of sequence learning. By combining a new method for extracting features from human magnetoencephalography data and decoding analyses, the authors provide incomplete evidence of an early, swift change in the brain regions correlated with sequence learning, including a set of previously unreported frontal cortical regions. The addition of more control analyses to rule out that head movement artefacts influence the findings, and to further explain the proposal of offline contextualization during short rest periods as the basis for improvement performance would strengthen the manuscript. 

      We appreciate the Editorial assessment on our paper’s strengths and novelty.  We have implemented additional control analyses to show that neither task-related eye movements nor increasing overlap of finger movements during learning account for our findings, which are that contextualized neural representations in a network of bilateral frontoparietal brain regions actively contribute to skill learning.  Importantly, we carried out additional analyses showing that contextualization develops predominantly during rest intervals.

      Public Reviews:

      We thank the Reviewers for their comments and suggestions, prompting new analyses and additions that strengthened our report.

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the issue of rapid skill learning and whether individual sequence elements (here: finger presses) are differentially represented in human MEG data. The authors use a decoding approach to classify individual finger elements and accomplish an accuracy of around 94%. A relevant finding is that the neural representations of individual finger elements dynamically change over the course of learning. This would be highly relevant for any attempts to develop better brain machine interfaces - one now can decode individual elements within a sequence with high precision, but these representations are not static but develop over the course of learning. 

      Strengths: The work follows a large body of work from the same group on the behavioural and neural foundations of sequence learning. The behavioural task is well established and neatly designed to allow for tracking learning and how individual sequence elements contribute. The inclusion of short offline rest periods between learning epochs has been influential because it has revealed that a lot, if not most of the gains in behaviour (ie speed of finger movements) occur in these so-called micro-offline rest periods. The authors use a range of new decoding techniques, and exhaustively interrogate their data in different ways, using different decoding approaches. Regardless of the approach, impressively high decoding accuracies are observed, but when using a hybrid approach that combines the MEG data in different ways, the authors observe decoding accuracies of individual sequence elements from the MEG data of up to 94%. 

      We have previously showed that neural replay of MEG activity representing the practiced skill correlated with micro-offline gains during rest intervals of early learning, 1 consistent with the recent report that hippocampal ripples during these offline periods predict human motor sequence learning2.  However, decoding accuracy in our earlier work1 needed improvement.  Here, we reported a strategy to improve decoding accuracy that could benefit future studies of neural replay or BCI using MEG.

      Weaknesses: 

      There are a few concerns which the authors may well be able to resolve. These are not weaknesses as such, but factors that would be helpful to address as these concern potential contributions to the results that one would like to rule out. Regarding the decoding results shown in Figure 2 etc, a concern is that within individual frequency bands, the highest accuracy seems to be within frequencies that match the rate of keypresses. This is a general concern when relating movement to brain activity, so is not specific to decoding as done here. As far as reported, there was no specific restraint to the arm or shoulder, and even then it is conceivable that small head movements would correlate highly with the vigor of individual finger movements. This concern is supported by the highest contribution in decoding accuracy being in middle frontal regions - midline structures that would be specifically sensitive to movement artefacts and don't seem to come to mind as key structures for very simple sequential keypress tasks such as this - and the overall pattern is remarkably symmetrical (despite being a unimanual finger task) and spatially broad. This issue may well be matching the time course of learning, as the vigor and speed of finger presses will also influence the degree to which the arm/shoulder and head move. This is not to say that useful information is contained within either of the frequencies or broadband data. But it raises the question of whether a lot is dominated by movement "artefacts" and one may get a more specific answer if removing any such contributions. 

      Reviewer #1 expresses concern that the combination of the low-frequency narrow-band decoder results, and the bilateral middle frontal regions displaying the highest average intra-parcel decoding performance across subjects is suggestive that the decoding results could be driven by head movement or other artefacts.

      Head movement artefacts are highly unlikely to contribute meaningfully to our results for the following reasons. First, in addition to ICA denoising, all “recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements” (see Methods). Second, the response pad was positioned in a manner that minimized wrist, arm or more proximal body movements during the task. Third, while head position was not monitored online for this study, the head was restrained using an inflatable air bladder, and head position was assessed at the beginning and at the end of each recording. Head movement did not exceed 5mm between the beginning and end of each scan for all participants included in the study. Fourth, we agree that despite the steps taken above, it is possible that minor head movements could still contribute to some remaining variance in the MEG data in our study. The Reviewer states a concern that “it is conceivable that small head movements would correlate highly with the vigor of individual finger movements”. However, in order for any such correlations to meaningfully impact decoding performance, such head movements would need to: (A) be consistent and pervasive throughout the recording (which might not be the case if the head movements were related to movement vigor and vigor changed over time); and (B) systematically vary between different finger movements, and also between the same finger movement performed at different sequence locations (see 5-class decoding performance in Figure 4B). The possibility of any head movement artefacts meeting all these conditions is extremely unlikely.

      Given the task design, a much more likely confound in our estimation would be the contribution of eye movement artefacts to the decoder performance (an issue appropriately raised by Reviewer #3 in the comments below). Remember from Figure 1A in the manuscript that an asterisk marks the current position in the sequence and is updated at each keypress. Since participants make very few performance errors, the position of the asterisk on the display is highly correlated with the keypress being made in the sequence. Thus, it is possible that if participants are attending to the visual feedback provided on the display, they may move their eyes in a way that is systematically related to the task.  Since we did record eye movements simultaneously with the MEG recordings (EyeLink 1000 Plus; Fs = 600 Hz), we were able to perform a control analysis to address this question. For each keypress event during trials in which no errors occurred (which is the same time-point that the asterisk position is updated), we extracted three features related to eye movements: 1) the gaze position at the time of asterisk position update (or keyDown event), 2) the gaze position 150ms later, and 3) the peak velocity of the eye movement between the two positions. We then constructed a classifier from these features with the aim of predicting the location of the asterisk (ordinal positions 1-5) on the display. As shown in the confusion matrix below (Author response image 1), the classifier failed to perform above chance levels (Overall cross-validated accuracy = 0.21817):

      Author response image 1.

      Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts).

      In fact, inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. A similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. The minimal participant engagement with the visual task display observed in this study highlights another important point – that the behavior in explicit sequence learning motor tasks is highly generative in nature rather than reactive to stimulus cues as in the serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when designing investigations and comparing findings across studies.

      We observed that initial keypress decoding accuracy was predominantly driven by contralateral primary sensorimotor cortex in the initial practice trials before transitioning to bilateral frontoparietal regions by trials 11 or 12 as performance gains plateaued.  The contribution of contralateral primary sensorimotor areas to early skill learning has been extensively reported in humans and non-human animals. 1,3-5  Similarly, the increased involvement of bilateral frontal and parietal regions to decoding during early skill learning in the non-dominant hand is well known.  Enhanced bilateral activation in both frontal and parietal cortex during skill learning has been extensively reported6-11, and appears to be even more prominent during early fine motor skill learning in the non-dominant hand12,13.  The frontal regions identified in these studies are known to play crucial roles in executive control14, motor planning15, and working memory6,8,16-18 processes, while the same parietal regions are known to integrate multimodal sensory feedback and support visuomotor transformations6,8,16-18, in addition to working memory19. Thus, it is not surprising that these regions increasingly contribute to decoding as subjects internalize the sequential task.  We now include a statement reflecting these considerations in the revised Discussion.

      A somewhat related point is this: when combining voxel and parcel space, a concern is whether a degree of circularity may have contributed to the improved accuracy of the combined data, because it seems to use the same MEG signals twice - the voxels most contributing are also those contributing most to a parcel being identified as relevant, as parcels reflect the average of voxels within a boundary. In this context, I struggled to understand the explanation given, ie that the improved accuracy of the hybrid model may be due to "lower spatially resolved whole-brain and higher spatially resolved regional activity patterns".

      We strongly disagree with the Reviewer’s assertion that the construction of the hybrid-space decoder is circular. To clarify, the base feature set for the hybrid-space decoder constructed for all participants includes whole-brain spatial patterns of MEG source activity averaged within parcels. As stated in the manuscript, these 148 inter-parcel features reflect “lower spatially resolved whole-brain activity patterns” or global brain dynamics. We then independently test how well spatial patterns of MEG source activity for all voxels distributed within individual parcels can decode keypress actions. Again, the testing of these intra-parcel spatial patterns, intended to capture “higher spatially resolved regional brain activity patterns”, is completely independent from one another and independent from the weighting of individual inter-parcel features. These intra-parcel features could, for example, provide additional information about muscle activation patterns or the task environment. These approximately 1150 intra-parcel voxels (on average, within the total number varying between subjects) are then combined with the 148 inter-parcel features to construct the final hybrid-space decoder. In fact, this varied spatial filter approach shares some similarities to the construction of convolutional neural networks (CNNs) used to perform object recognition in image classification applications. One could also view this hybrid-space decoding approach as a spatial analogue to common time-frequency based analyses such as theta-gamma phase amplitude coupling (PAC), which combine information from two or more narrow-band spectral features derived from the same time-series data.

      We directly tested this hypothesis – that spatially overlapping intra- and inter-parcel features portray different information – by constructing an alternative hybrid-space decoder (HybridAlt) that excluded average inter-parcel features which spatially overlapped with intra-parcel voxel features, and comparing the performance to the decoder used in the manuscript (HybridOrig). The prediction was that if the overlapping parcel contained similar information to the more spatially resolved voxel patterns, then removing the parcel features (n=8) from the decoding analysis should not impact performance. In fact, despite making up less than 1% of the overall input feature space, removing those parcels resulted in a significant drop in overall performance greater than 2% (78.15% ± SD 7.03% for HybridOrig vs. 75.49% ± SD 7.17% for HybridAlt; Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04) (Author response image 2).

      Author response image 2.

      Comparison of decoding performances with two different hybrid approaches. HybridAlt: Intra-parcel voxel-space features of top ranked parcels and inter-parcel features of remaining parcels. HybridOrig:  Voxel-space features of top ranked parcels and whole-brain parcel-space features (i.e. – the version used in the manuscript). Dots represent decoding accuracy for individual subjects. Dashed lines indicate the trend in performance change across participants. Note, that HybridOrig (the approach used in our manuscript) significantly outperforms the HybridAlt approach, indicating that the excluded parcel features provide unique information compared to the spatially overlapping intra-parcel voxel patterns.

      Firstly, there will be a relatively high degree of spatial contiguity among voxels because of the nature of the signal measured, i.e. nearby individual voxels are unlikely to be independent. Secondly, the voxel data gives a somewhat misleading sense of precision; the inversion can be set up to give an estimate for each voxel, but there will not just be dependence among adjacent voxels, but also substantial variation in the sensitivity and confidence with which activity can be projected to different parts of the brain. Midline and deeper structures come to mind, where the inversion will be more problematic than for regions along the dorsal convexity of the brain, and a concern is that in those midline structures, the highest decoding accuracy is seen. 

      We definitely agree with the Reviewer that some inter-parcel features representing neighboring (or spatially contiguous) voxels are likely to be correlated. This has been well documented in the MEG literature20,21 and is a particularly important confound to address in functional or effective connectivity analyses (not performed in the present study). In the present analysis, any correlation between adjacent voxels presents a multi-collinearity problem, which effectively reduces the dimensionality of the input feature space. However, as long as there are multiple groups of correlated voxels within each parcel (i.e. - the effective dimensionality is still greater than 1), the intra-parcel spatial patterns could still meaningfully contribute to the decoder performance. Two specific results support this assertion.

      First, we obtained higher decoding accuracy with voxel-space features [74.51% (± SD 7.34%)] compared to parcel space features [68.77% (± SD 7.6%)] (Figure 3B), indicating individual voxels carry more information in decoding the keypresses than the averaged voxel-space features or parcel-space features.  Second, Individual voxels within a parcel showed varying feature importance scores in decoding keypresses (Author response image 3). This finding supports the Reviewer’s assertion that neighboring voxels express similar information, but also shows that the correlated voxels form mini subclusters that are much smaller spatially than the parcel they reside in.

      Author response image 3.

      Feature importance score of individual voxels in decoding keypresses: MRMR was used to rank the individual voxel space features in decoding keypresses and the min-max normalized MRMR score was mapped to a structural brain surface. Note that individual voxels within a parcel showed different contribution to decoding.

       

      Some of these concerns could be addressed by recording head movement (with enough precision) to regress out these contributions. The authors state that head movement was monitored with 3 fiducials, and their time courses ought to provide a way to deal with this issue. The ICA procedure may not have sufficiently dealt with removing movement-related problems, but one could eg relate individual components that were identified to the keypresses as another means for checking. An alternative could be to focus on frequency ranges above the movement frequencies. The accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment. 

      We have already addressed the issue of movement related artefacts in the first response above. With respect to a focus on frequency ranges above movement frequencies, the Reviewer states the “accuracy for those still seems impressive and may provide a slightly more biologically plausible assessment”. First, it is important to note that cortical delta-band oscillations measured with local field potentials (LFPs) in macaques is known to contain important information related to end-effector kinematics22,23 muscle activation patterns24 and temporal sequencing25 during skilled reaching and grasping actions. Thus, there is a substantial body of evidence that low-frequency neural oscillatory activity in this range contains important information about the skill learning behavior investigated in the present study. Second, our own data shows (which the Reviewer also points out) that significant information related to the skill learning behavior is also present in higher frequency bands (see Figure 2A and Figure 3—figure supplement 1). As we pointed out in our earlier response to questions about the hybrid space decoder architecture (see above), it is likely that different, yet complimentary, information is encoded across different temporal frequencies (just as it is encoded across different spatial frequencies). Again, this interpretation is supported by our data as the highest performing classifiers in all cases (when holding all parameters constant) were always constructed from broadband input MEG data (Figure 2A and Figure 3—figure supplement 1).  

      One question concerns the interpretation of the results shown in Figure 4. They imply that during the course of learning, entirely different brain networks underpin the behaviour. Not only that, but they also include regions that would seem rather unexpected to be key nodes for learning and expressing relatively simple finger sequences, such as here. What then is the biological plausibility of these results? The authors seem to circumnavigate this issue by moving into a distance metric that captures the (neural network) changes over the course of learning, but the discussion seems detached from which regions are actually involved; or they offer a rather broad discussion of the anatomical regions identified here, eg in the context of LFOs, where they merely refer to "frontoparietal regions". 

      The Reviewer notes the shift in brain networks driving keypress decoding performance between trials 1, 11 and 36 as shown in Figure 4A. The Reviewer questions whether these substantial shifts in brain network states underpinning the skill are biologically plausible, as well as the likelihood that bilateral superior and middle frontal and parietal cortex are important nodes within these networks.

      First, previous fMRI work in humans performing a similar sequence learning task showed that flexibility in brain network composition (i.e. – changes in brain region members displaying coordinated activity) is up-regulated in novel learning environments and explains differences in learning rates across individuals26.  This work supports our interpretation of the present study data, that brain networks engaged in sequential motor skills rapidly reconfigure during early learning.

      Second, frontoparietal network activity is known to support motor memory encoding during early learning27,28. For example, reactivation events in the posterior parietal29 and medial prefrontal30,31 cortex (MPFC) have been temporally linked to hippocampal replay, and are posited to support memory consolidation across several memory domains32, including motor sequence learning1,33,34.  Further, synchronized interactions between MPFC and hippocampus are more prominent during early learning as opposed to later stages27,35,36, perhaps reflecting “redistribution of hippocampal memories to MPFC” 27.  MPFC contributes to very early memory formation by learning association between contexts, locations, events and adaptive responses during rapid learning37. Consistently, coupling between hippocampus and MPFC has been shown during, and importantly immediately following (rest) initial memory encoding38,39.  Importantly, MPFC activity during initial memory encoding predicts subsequent recall40. Thus, the spatial map required to encode a motor sequence memory may be “built under the supervision of the prefrontal cortex” 28, also engaged in the development of an abstract representation of the sequence41.  In more abstract terms, the prefrontal, premotor and parietal cortices support novice performance “by deploying attentional and control processes” 42-44 required during early learning42-44. The dorsolateral prefrontal cortex DLPFC specifically is thought to engage in goal selection and sequence monitoring during early skill practice45, all consistent with the schema model of declarative memory in which prefrontal cortices play an important role in encoding46,47.  Thus, several prefrontal and frontoparietal regions contributing to long term learning 48 are also engaged in early stages of encoding. Altogether, there is strong biological support for the involvement of bilateral prefrontal and frontoparietal regions to decoding during early skill learning.  We now address this issue in the revised manuscript.

      If I understand correctly, the offline neural representation analysis is in essence the comparison of the last keypress vs the first keypress of the next sequence. In that sense, the activity during offline rest periods is actually not considered. This makes the nomenclature somewhat confusing. While it matches the behavioural analysis, having only key presses one can't do it in any other way, but here the authors actually do have recordings of brain activity during offline rest. So at the very least calling it offline neural representation is misleading to this reviewer because what is compared is activity during the last and during the next keypress, not activity during offline periods. But it also seems a missed opportunity - the authors argue that most of the relevant learning occurs during offline rest periods, yet there is no attempt to actually test whether activity during this period can be useful for the questions at hand here. 

      We agree with the Reviewer that our previous “offline neural representation” nomenclature could be misinterpreted. In the revised manuscript we refer to this difference as the “offline neural representational change”. Please, note that our previous work did link offline neural activity (i.e. – 16-22 Hz beta power and neural replay density during inter-practice rest periods) to observed micro-offline gains49.

      Reviewer #2 (Public review): 

      Summary 

      Dash et al. asked whether and how the neural representation of individual finger movements is "contextualized" within a trained sequence during the very early period of sequential skill learning by using decoding of MEG signal. Specifically, they assessed whether/how the same finger presses (pressing index finger) embedded in the different ordinal positions of a practiced sequence (4-1-3-2-4; here, the numbers 1 through 4 correspond to the little through the index fingers of the non-dominant left hand) change their representation (MEG feature). They did this by computing either the decoding accuracy of the index finger at the ordinal positions 1 vs. 5 (index_OP1 vs index_OP5) or pattern distance between index_OP1 vs. index_OP5 at each training trial and found that both the decoding accuracy and the pattern distance progressively increase over the course of learning trials. More interestingly, they also computed the pattern distance for index_OP5 for the last execution of a practice trial vs. index_OP1 for the first execution in the next practice trial (i.e., across the rest period). This "off-line" distance was significantly larger than the "on-line" distance, which was computed within practice trials and predicted micro-offline skill gain. Based on these results, the authors conclude that the differentiation of representation for the identical movement embedded in different positions of a sequential skill ("contextualization") primarily occurs during early skill learning, especially during rest, consistent with the recent theory of the "micro-offline learning" proposed by the authors' group. I think this is an important and timely topic for the field of motor learning and beyond. <br /> Strengths 

      The specific strengths of the current work are as follows. First, the use of temporally rich neural information (MEG signal) has a large advantage over previous studies testing sequential representations using fMRI. This allowed the authors to examine the earliest period (= the first few minutes of training) of skill learning with finer temporal resolution. Second, through the optimization of MEG feature extraction, the current study achieved extremely high decoding accuracy (approx. 94%) compared to previous works. As claimed by the authors, this is one of the strengths of the paper (but see my comments). Third, although some potential refinement might be needed, comparing "online" and "offline" pattern distance is a neat idea. 

      Weaknesses 

      Along with the strengths I raised above, the paper has some weaknesses. First, the pursuit of high decoding accuracy, especially the choice of time points and window length (i.e., 200 msec window starting from 0 msec from key press onset), casts a shadow on the interpretation of the main result. Currently, it is unclear whether the decoding results simply reflect behavioral change or true underlying neural change. As shown in the behavioral data, the key press speed reached 3~4 presses per second already at around the end of the early learning period (11th trial), which means inter-press intervals become as short as 250-330 msec. Thus, in almost more than 60% of training period data, the time window for MEG feature extraction (200 msec) spans around 60% of the inter-press intervals. Considering that the preparation/cueing of subsequent presses starts ahead of the actual press (e.g., Kornysheva et al., 2019) and/or potential online planning (e.g., Ariani and Diedrichsen, 2019), the decoder likely has captured these future press information as well as the signal related to the current key press, independent of the formation of genuine sequential representation (e.g., "contextualization" of individual press). This may also explain the gradual increase in decoding accuracy or pattern distance between index_OP1 vs. index_OP5 (Figure 4C and 5A), which co-occurred with performance improvement, as shorter inter-press intervals are more favorable for the dissociating the two index finger presses followed by different finger presses. The compromised decoding accuracies for the control sequences can be explained in similar logic. Therefore, more careful consideration and elaborated discussion seem necessary when trying to both achieve high-performance decoding and assess early skill learning, as it can impact all the subsequent analyses.

      The Reviewer raises the possibility that (given the windowing parameters used in the present study) an increase in “contextualization” with learning could simply reflect faster typing speeds as opposed to an actual change in the underlying neural representation. The issue can essentially be framed as a mixing problem. As correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Moreover, if the representation distance is largely driven by this mixing effect, it’s also possible that the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      We also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Overall, we do strongly agree with the Reviewer that the naturalistic, self-paced, generative task employed in the present study results in overlapping brain processes related to planning, execution, evaluation and memory of the action sequence. We also agree that there are several tradeoffs to consider in the construction of the classifiers depending on the study aim. Given our aim of optimizing keypress decoder accuracy in the present study, the set of trade-offs resulted in representations reflecting more the latter three processes, and less so the planning component. Whether separate decoders can be constructed to tease apart the representations or networks supporting these overlapping processes is an important future direction of research in this area. For example, work presently underway in our lab constrains the selection of windowing parameters in a manner that allows individual classifiers to be temporally linked to specific planning, execution, evaluation or memory-related processes to discern which brain networks are involved and how they adaptively reorganize with learning. Results from the present study (Figure 4—figure supplement 2) showing hybrid-space decoder prediction accuracies exceeding 74% for temporal windows spanning as little as 25ms and located up to 100ms prior to the keyDown event strongly support the feasibility of such an approach.

      Related to the above point, testing only one particular sequence (4-1-3-2-4), aside from the control ones, limits the generalizability of the finding. This also may have contributed to the extremely high decoding accuracy reported in the current study. 

      The Reviewer raises a question about the generalizability of the decoder accuracy reported in our study. Fortunately, a comparison between decoder performances on Day 1 and Day 2 datasets does provide some insight into this issue. As the Reviewer points out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. Both changes in accuracy are important with regards to the generalizability of our findings. First, 87.11% performance accuracy for the trained sequence data on Day 2 (a reduction of only 3.36%) indicates that the hybrid-space decoder performance is robust over multiple MEG sessions, and thus, robust to variations in SNR across the MEG sensor array caused by small differences in head position between scans.  This indicates a substantial advantage over sensor-space decoding approaches. Furthermore, when tested on data from unpracticed sequences, overall performance dropped an additional 7.67%. This difference reflects the performance bias of the classifier for the trained sequence, possibly caused by high-order sequence structure being incorporated into the feature weights. In the future, it will be important to understand in more detail how random or repeated keypress sequence training data impacts overall decoder performance and generalization. We strongly agree with the Reviewer that the issue of generalizability is extremely important and have added a new paragraph to the Discussion in the revised manuscript highlighting the strengths and weaknesses of our study with respect to this issue.

      In terms of clinical BCI, one of the potential relevance of the study, as claimed by the authors, it is not clear that the specific time window chosen in the current study (up to 200 msec since key press onset) is really useful. In most cases, clinical BCI would target neural signals with no overt movement execution due to patients' inability to move (e.g., Hochberg et al., 2012). Given the time window, the surprisingly high performance of the current decoder may result from sensory feedback and/or planning of subsequent movement, which may not always be available in the clinical BCI context. Of course, the decoding accuracy is still much higher than chance even when using signal before the key press (as shown in Figure 4 Supplement 2), but it is not immediately clear to me that the authors relate their high decoding accuracy based on post-movement signal to clinical BCI settings.

      The Reviewer questions the relevance of the specific window parameters used in the present study for clinical BCI applications, particularly for paretic patients who are unable to produce finger movements or for whom afferent sensory feedback is no longer intact. We strongly agree with the Reviewer that any intended clinical application must carefully consider these specific input feature constraints dictated by the clinical cohort, and in turn impose appropriate and complimentary constraints on classifier parameters that may differ from the ones used in the present study.  We now highlight this issue in the Discussion of the revised manuscript and relate our present findings to published clinical BCI work within this context.

      One of the important and fascinating claims of the current study is that the "contextualization" of individual finger movements in a trained sequence specifically occurs during short rest periods in very early skill learning, echoing the recent theory of micro-offline learning proposed by the authors' group. Here, I think two points need to be clarified. First, the concept of "contextualization" is kept somewhat blurry throughout the text. It is only at the later part of the Discussion (around line #330 on page 13) that some potential mechanism for the "contextualization" is provided as "what-and-where" binding. Still, it is unclear what "contextualization" actually is in the current data, as the MEG signal analyzed is extracted from 0-200 msec after the keypress. If one thinks something is contextualizing an action, that contextualization should come earlier than the action itself. 

      The Reviewer requests that we: 1) more clearly define our use of the term “contextualization” and 2) provide the rationale for assessing it over a 200ms window aligned to the keyDown event. This choice of window parameters means that the MEG activity used in our analysis was coincident with, rather than preceding, the actual keypresses.  We define contextualization as the differentiation of representation for the identical movement embedded in different positions of a sequential skill. That is, representations of individual action elements progressively incorporate information about their relationship to the overall sequence structure as the skill is learned. We agree with the Reviewer that this can be appropriately interpreted as “what-and-where” binding. We now incorporate this definition in the Introduction of the revised manuscript as requested.

      The window parameters for optimizing accurate decoding individual finger movements were determined using a grid search of the parameter space (a sliding window of variable width between 25-350 ms with 25 ms increments variably aligned from 0 to +100ms with 10ms increments relative to the keyDown event). This approach generated 140 different temporal windows for each keypress for each participant, with the final parameter selection determined through comparison of the resulting performance between each decoder.  Importantly, the decision to optimize for decoding accuracy placed an emphasis on keypress representations characterized by the most consistent and robust features shared across subjects, which in turn maximize statistical power in detecting common learning-related changes. In this case, the optimal window encompassed a 200ms epoch aligned to the keyDown event (t0 = 0 ms).  We then asked if the representations (i.e. – spatial patterns of combined parcel- and voxel-space activity) of the same digit at two different sequence positions changed with practice within this optimal decoding window.  Of course, our findings do not rule out the possibility that contextualization can also be found before or even after this time window, as we did not directly address this issue in the present study.  Ongoing work in our lab, as pointed out above, is investigating contextualization within different time windows tailored specifically for assessing sequence skill action planning, execution, evaluation and memory processes.

      The second point is that the result provided by the authors is not yet convincing enough to support the claim that "contextualization" occurs during rest. In the original analysis, the authors presented the statistical significance regarding the correlation between the "offline" pattern differentiation and micro-offline skill gain (Figure 5. Supplement 1), as well as the larger "offline" distance than "online" distance (Figure 5B). However, this analysis looks like regressing two variables (monotonically) increasing as a function of the trial. Although some information in this analysis, such as what the independent/dependent variables were or how individual subjects were treated, was missing in the Methods, getting a statistically significant slope seems unsurprising in such a situation. Also, curiously, the same quantitative evidence was not provided for its "online" counterpart, and the authors only briefly mentioned in the text that there was no significant correlation between them. It may be true looking at the data in Figure 5A as the online representation distance looks less monotonically changing, but the classification accuracy presented in Figure 4C, which should reflect similar representational distance, shows a more monotonic increase up to the 11th trial. Further, the ways the "online" and "offline" representation distance was estimated seem to make them not directly comparable. While the "online" distance was computed using all the correct press data within each 10 sec of execution, the "offline" distance is basically computed by only two presses (i.e., the last index_OP5 vs. the first index_OP1 separated by 10 sec of rest). Theoretically, the distance between the neural activity patterns for temporally closer events tends to be closer than that between the patterns for temporally far-apart events. It would be fairer to use the distance between the first index_OP1 vs. the last index_OP5 within an execution period for "online" distance, as well. 

      The Reviewer suggests that the current data is not convincing enough to show that contextualization occurs during rest and raises two important concerns: 1) the relationship between online contextualization and micro-online gains is not shown, and 2) the online distance was calculated differently from its offline counterpart (i.e. - instead of calculating the distance between last IndexOP5 and first IndexOP1 from a single trial, the distance was calculated for each sequence within a trial and then averaged).

      We addressed the first concern by performing individual subject correlations between 1) contextualization changes during rest intervals and micro-offline gains; 2) contextualization changes during practice trials and micro-online gains, and 3) contextualization changes during practice trials and micro-offline gains (Author response image 4). We then statistically compared the resulting correlation coefficient distributions and found that within-subject correlations for contextualization changes during rest intervals and micro-offline gains were significantly higher than online contextualization and micro-online gains (t = 3.2827, p = 0.0015) and online contextualization and micro-offline gains (t = 3.7021, p = 5.3013e-04). These results are consistent with our interpretation that micro-offline gains are supported by contextualization changes during the inter-practice rest period.

      Author response image 4.

      Distribution of individual subject correlation coefficients between contextualization changes occurring during practice or rest with  micro-online and micro-offline performance gains. Note that, the correlation distributions were significantly higher for the relationship between contextualization changes during rest and micro-offline gains than for contextualization changes during practice and either micro-online or offline gain.

      With respect to the second concern highlighted above, we agree with the Reviewer that one limitation of the analysis comparing online versus offline changes in contextualization as presented in the reviewed manuscript, is that it does not eliminate the possibility that any differences could simply be explained by the passage of time (which is smaller for the online analysis compared to the offline analysis). The Reviewer suggests an approach that addresses this issue, which we have now carried out.   When quantifying online changes in contextualization from the first IndexOP1 the last IndexOP5 keypress in the same trial we observed no learning-related trend (Author response image 5, right panel). Importantly, offline distances were significantly larger than online distances regardless of the measurement approach and neither predicted online learning (Author response image 6).

      Author response image 5.

      Trial by trial trend of offline (left panel) and online (middle and right panels) changes in contextualization. Offline changes in contextualization were assessed by calculating the distance between neural representations for the last IndexOP5 keypress in the previous trial and the first IndexOP1 keypress in the present trial. Two different approaches were used to characterize online contextualization changes. The analysis included in the reviewed manuscript (middle panel) calculated the distance between IndexOP1 and IndexOP5 for each correct sequence, which was then averaged across the trial. This approach is limited by the lack of control for the passage of time when making online versus offline comparisons. Thus, the second approach controlled for the passage of time by calculating distance between the representations associated with the first IndexOP1 keypress and the last IndexOP5 keypress within the same trial. Note that while the first approach showed an increase online contextualization trend with practice, the second approach did not.

      Author response image 6.

      Relationship between online contextualization and online learning is shown for both within-sequence (left; note that this is the online contextualization measure used in the reviewd manuscript) and across-sequence (right) distance calculation. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

      A related concern regarding the control analysis, where individual values for max speed and the degree of online contextualization were compared (Figure 5 Supplement 3), is whether the individual difference is meaningful. If I understood correctly, the optimization of the decoding process (temporal window, feature inclusion/reduction, decoder, etc.) was performed for individual participants, and the same feature extraction was also employed for the analysis of representation distance (i.e., contextualization). If this is the case, the distances are individually differently calculated and they may need to be normalized relative to some stable reference (e.g., 1 vs. 4 or average distance within the control sequence presses) before comparison across the individuals. 

      The Reviewer makes a good point here. We have now implemented the suggested normalization procedure in the analysis provided in the revised manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      One goal of this paper is to introduce a new approach for highly accurate decoding of finger movements from human magnetoencephalography data via dimension reduction of a "multi-scale, hybrid" feature space. Following this decoding approach, the authors aim to show that early skill learning involves "contextualization" of the neural coding of individual movements, relative to their position in a sequence of consecutive movements. Furthermore, they aim to show that this "contextualization" develops primarily during short rest periods interspersed with skill training and correlates with a performance metric which the authors interpret as an indicator of offline learning. <br /> Strengths: 

      A clear strength of the paper is the innovative decoding approach, which achieves impressive decoding accuracies via dimension reduction of a "multi-scale, hybrid space". This hybrid-space approach follows the neurobiologically plausible idea of the concurrent distribution of neural coding across local circuits as well as large-scale networks. A further strength of the study is the large number of tested dimension reduction techniques and classifiers (though the manuscript reveals little about the comparison of the latter). 

      We appreciate the Reviewer’s comments regarding the paper’s strengths.

      A simple control analysis based on shuffled class labels could lend further support to this complex decoding approach. As a control analysis that completely rules out any source of overfitting, the authors could test the decoder after shuffling class labels. Following such shuffling, decoding accuracies should drop to chance level for all decoding approaches, including the optimized decoder. This would also provide an estimate of actual chance-level performance (which is informative over and beyond the theoretical chance level). Furthermore, currently, the manuscript does not explain the huge drop in decoding accuracies for the voxel-space decoding (Figure 3B). Finally, the authors' approach to cortical parcellation raises questions regarding the information carried by varying dipole orientations within a parcel (which currently seems to be ignored?) and the implementation of the mean-flipping method (given that there are two dimensions - space and time - what do the authors refer to when they talk about the sign of the "average source", line 477?). 

      The Reviewer recommends that we: 1) conduct an additional control analysis on classifier performance using shuffled class labels, 2) provide a more detailed explanation regarding the drop in decoding accuracies for the voxel-space decoding following LDA dimensionality reduction (see Fig 3B), and 3) provide additional details on how problems related to dipole solution orientations were addressed in the present study.  

      In relation to the first point, we have now implemented a random shuffling approach as a control for the classification analyses. The results of this analysis indicated that the chance level accuracy was 22.12% (± SD 9.1%) for individual keypress decoding (4-class classification), and 18.41% (± SD 7.4%) for individual sequence item decoding (5-class classification), irrespective of the input feature set or the type of decoder used. Thus, the decoding accuracy observed with the final model was substantially higher than these chance levels.  

      Second, please note that the dimensionality of the voxel-space feature set is very high (i.e. – 15684). LDA attempts to map the input features onto a much smaller dimensional space (number of classes-1; e.g. –  3 dimensions, for 4-class keypress decoding). Given the very high dimension of the voxel-space input features in this case, the resulting mapping exhibits reduced accuracy. Despite this general consideration, please refer to Figure 3—figure supplement 3, where we observe improvement in voxel-space decoder performance when utilizing alternative dimensionality reduction techniques.

      The decoders constructed in the present study assess the average spatial patterns across time (as defined by the windowing procedure) in the input feature space.  We now provide additional details in the Methods of the revised manuscript pertaining to the parcellation procedure and how the sign ambiguity problem was addressed in our analysis.

      Weaknesses: 

      A clear weakness of the paper lies in the authors' conclusions regarding "contextualization". Several potential confounds, described below, question the neurobiological implications proposed by the authors and provide a simpler explanation of the results. Furthermore, the paper follows the assumption that short breaks result in offline skill learning, while recent evidence, described below, casts doubt on this assumption. 

      We thank the Reviewer for giving us the opportunity to address these issues in detail (see below).

      The authors interpret the ordinal position information captured by their decoding approach as a reflection of neural coding dedicated to the local context of a movement (Figure 4). One way to dissociate ordinal position information from information about the moving effectors is to train a classifier on one sequence and test the classifier on other sequences that require the same movements, but in different positions50. In the present study, however, participants trained to repeat a single sequence (4-1-3-2-4). As a result, ordinal position information is potentially confounded by the fixed finger transitions around each of the two critical positions (first and fifth press). Across consecutive correct sequences, the first keypress in a given sequence was always preceded by a movement of the index finger (=last movement of the preceding sequence), and followed by a little finger movement. The last keypress, on the other hand, was always preceded by a ring finger movement, and followed by an index finger movement (=first movement of the next sequence). Figure 4 - Supplement 2 shows that finger identity can be decoded with high accuracy (>70%) across a large time window around the time of the key press, up to at least +/-100 ms (and likely beyond, given that decoding accuracy is still high at the boundaries of the window depicted in that figure). This time window approaches the keypress transition times in this study. Given that distinct finger transitions characterized the first and fifth keypress, the classifier could thus rely on persistent (or "lingering") information from the preceding finger movement, and/or "preparatory" information about the subsequent finger movement, in order to dissociate the first and fifth keypress. Currently, the manuscript provides no evidence that the context information captured by the decoding approach is more than a by-product of temporally extended, and therefore overlapping, but independent neural representations of consecutive keypresses that are executed in close temporal proximity - rather than a neural representation dedicated to context. 

      Such temporal overlap of consecutive, independent finger representations may also account for the dynamics of "ordinal coding"/"contextualization", i.e., the increase in 2-class decoding accuracy, across Day 1 (Figure 4C). As learning progresses, both tapping speed and the consistency of keypress transition times increase (Figure 1), i.e., consecutive keypresses are closer in time, and more consistently so. As a result, information related to a given keypress is increasingly overlapping in time with information related to the preceding and subsequent keypresses. The authors seem to argue that their regression analysis in Figure 5 - Figure Supplement 3 speaks against any influence of tapping speed on "ordinal coding" (even though that argument is not made explicitly in the manuscript). However, Figure 5 - Figure Supplement 3 shows inter-individual differences in a between-subject analysis (across trials, as in panel A, or separately for each trial, as in panel B), and, therefore, says little about the within-subject dynamics of "ordinal coding" across the experiment. A regression of trial-by-trial "ordinal coding" on trial-by-trial tapping speed (either within-subject or at a group-level, after averaging across subjects) could address this issue. Given the highly similar dynamics of "ordinal coding" on the one hand (Figure 4C), and tapping speed on the other hand (Figure 1B), I would expect a strong relationship between the two in the suggested within-subject (or group-level) regression. Furthermore, learning should increase the number of (consecutively) correct sequences, and, thus, the consistency of finger transitions. Therefore, the increase in 2-class decoding accuracy may simply reflect an increasing overlap in time of increasingly consistent information from consecutive keypresses, which allows the classifier to dissociate the first and fifth keypress more reliably as learning progresses, simply based on the characteristic finger transitions associated with each. In other words, given that the physical context of a given keypress changes as learning progresses - keypresses move closer together in time and are more consistently correct - it seems problematic to conclude that the mental representation of that context changes. To draw that conclusion, the physical context should remain stable (or any changes to the physical context should be controlled for). 

      The issues raised by Reviewer #3 here are similar to two issues raised by Reviewer #2 above and agree they must both be carefully considered in any evaluation of our findings.

      As both Reviewers pointed out, the classifiers in this study were trained and tested on keypresses performed while practicing a specific sequence (4-1-3-2-4). The study was designed this way as to avoid the impact of interference effects on learning dynamics. The cross-validated performance of classifiers on MEG data collected within the same session was 90.47% overall accuracy (4-class; Figure 3C). We then tested classifier performance on data collected during a separate MEG session conducted approximately 24 hours later (Day 2; see Figure 3—supplement 3). We observed a reduction in overall accuracy rate to 87.11% when tested on MEG data recorded while participants performed the same learned sequence, and 79.44% when they performed several previously unpracticed sequences. This classification performance difference of 7.67% when tested on the Day 2 data could reflect the performance bias of the classifier for the trained sequence, possibly caused by mixed information from temporally close keypresses being incorporated into the feature weights.

      Along these same lines, both Reviewers also raise the possibility that an increase in “ordinal coding/contextualization” with learning could simply reflect an increase in this mixing effect caused by faster typing speeds as opposed to an actual change in the underlying neural representation. The basic idea is that as correct sequences are generated at higher and higher speeds over training, MEG activity patterns related to the planning, execution, evaluation and memory of individual keypresses overlap more in time. Thus, increased overlap between the “4” and “1” keypresses (at the start of the sequence) and “2” and “4” keypresses (at the end of the sequence) could artefactually increase contextualization distances even if the underlying neural representations for the individual keypresses remain unchanged (assuming this mixing of representations is used by the classifier to differentially tag each index finger press). If this were the case, it follows that such mixing effects reflecting the ordinal sequence structure would also be observable in the distribution of decoder misclassifications. For example, “4” keypresses would be more likely to be misclassified as “1” or “2” keypresses (or vice versa) than as “3” keypresses. The confusion matrices presented in Figures 3C and 4B and Figure 3—figure supplement 3A in the previously submitted manuscript do not show this trend in the distribution of misclassifications across the four fingers.

      Following this logic, it’s also possible that if the ordinal coding is largely driven by this mixing effect, the increased overlap between consecutive index finger keypresses during the 4-4 transition marking the end of one sequence and the beginning of the next one could actually mask contextualization-related changes to the underlying neural representations and make them harder to detect. In this case, a decoder tasked with separating individual index finger keypresses into two distinct classes based upon sequence position might show decreased performance with learning as adjacent keypresses overlapped in time with each other to an increasing extent. However, Figure 4C in our previously submitted manuscript does not support this possibility, as the 2-class hybrid classifier displays improved classification performance over early practice trials despite greater temporal overlap.

      As noted in the above replay to Reviewer #2, we also conducted a new multivariate regression analysis to directly assess whether the neural representation distance score could be predicted by the 4-1, 2-4 and 4-4 keypress transition times observed for each complete correct sequence (both predictor and response variables were z-score normalized within-subject). The results of this analysis affirmed that the possible alternative explanation put forward by the Reviewer is not supported by our data (Adjusted R2 = 0.00431; F = 5.62). We now include this new negative control analysis result in the revised manuscript.

      Finally, the Reviewer hints that one way to address this issue would be to compare MEG responses before and after learning for sequences typed at a fixed speed. However, given that the speed-accuracy trade-off should improve with learning, a comparison between unlearned and learned skill states would dictate that the skill be evaluated at a very low fixed speed. Essentially, such a design presents the problem that the post-training test is evaluating the representation in the unlearned behavioral state that is not representative of the acquired skill. Thus, this approach would not address our experimental question: “do neural representations of the same action performed at different locations within a skill sequence contextually differentiate or remain stable as learning evolves”.

      A similar difference in physical context may explain why neural representation distances ("differentiation") differ between rest and practice (Figure 5). The authors define "offline differentiation" by comparing the hybrid space features of the last index finger movement of a trial (ordinal position 5) and the first index finger movement of the next trial (ordinal position 1). However, the latter is not only the first movement in the sequence but also the very first movement in that trial (at least in trials that started with a correct sequence), i.e., not preceded by any recent movement. In contrast, the last index finger of the last correct sequence in the preceding trial includes the characteristic finger transition from the fourth to the fifth movement. Thus, there is more overlapping information arising from the consistent, neighbouring keypresses for the last index finger movement, compared to the first index finger movement of the next trial. A strong difference (larger neural representation distance) between these two movements is, therefore, not surprising, given the task design, and this difference is also expected to increase with learning, given the increase in tapping speed, and the consequent stronger overlap in representations for consecutive keypresses. Furthermore, initiating a new sequence involves pre-planning, while ongoing practice relies on online planning (Ariani et al., eNeuro 2021), i.e., two mental operations that are dissociable at the level of neural representation (Ariani et al., bioRxiv 2023). 

      The Reviewer argues that the comparison of last finger movement of a trial and the first in the next trial are performed in different circumstances and contexts. This is an important point and one we tend to agree with. For this task, the first sequence in a practice trial (which is pre-planned offline) is performed in a somewhat different context from the sequence iterations that follow, which involve temporally overlapping planning, execution and evaluation processes.  The Reviewer is particularly concerned about a difference in the temporal mixing effect issue raised above between the first and last keypresses performed in a trial. However, in contrast to the Reviewers stated argument above, findings from Korneysheva et. al (2019) showed that neural representations of individual actions are competitively queued during the pre-planning period in a manner that reflects the ordinal structure of the learned sequence.  Thus, mixing effects are likely still present for the first keypress in a trial. Also note that we now present new control analyses in multiple responses above confirming that hypothetical mixing effects between adjacent keypresses do not explain our reported contextualization finding. A statement addressing these possibilities raised by the Reviewer has been added to the Discussion in the revised manuscript.

      In relation to pre-planning, ongoing MEG work in our lab is investigating contextualization within different time windows tailored specifically for assessing how sequence skill action planning evolves with learning.

      Given these differences in the physical context and associated mental processes, it is not surprising that "offline differentiation", as defined here, is more pronounced than "online differentiation". For the latter, the authors compared movements that were better matched regarding the presence of consistent preceding and subsequent keypresses (online differentiation was defined as the mean difference between all first vs. last index finger movements during practice).  It is unclear why the authors did not follow a similar definition for "online differentiation" as for "micro-online gains" (and, indeed, a definition that is more consistent with their definition of "offline differentiation"), i.e., the difference between the first index finger movement of the first correct sequence during practice, and the last index finger of the last correct sequence. While these two movements are, again, not matched for the presence of neighbouring keypresses (see the argument above), this mismatch would at least be the same across "offline differentiation" and "online differentiation", so they would be more comparable. 

      This is the same point made earlier by Reviewer #2, and we agree with this assessment. As stated in the response to Reviewer #2 above, we have now carried out quantification of online contextualization using this approach and included it in the revised manuscript. We thank the Reviewer for this suggestion.

      A further complication in interpreting the results regarding "contextualization" stems from the visual feedback that participants received during the task. Each keypress generated an asterisk shown above the string on the screen, irrespective of whether the keypress was correct or incorrect. As a result, incorrect (e.g., additional, or missing) keypresses could shift the phase of the visual feedback string (of asterisks) relative to the ordinal position of the current movement in the sequence (e.g., the fifth movement in the sequence could coincide with the presentation of any asterisk in the string, from the first to the fifth). Given that more incorrect keypresses are expected at the start of the experiment, compared to later stages, the consistency in visual feedback position, relative to the ordinal position of the movement in the sequence, increased across the experiment. A better differentiation between the first and the fifth movement with learning could, therefore, simply reflect better decoding of the more consistent visual feedback, based either on the feedback-induced brain response, or feedback-induced eye movements (the study did not include eye tracking). It is not clear why the authors introduced this complicated visual feedback in their task, besides consistency with their previous studies.

      We strongly agree with the Reviewer that eye movements related to task engagement are important to rule out as a potential driver of the decoding accuracy or contextualization effect. We address this issue above in response to a question raised by Reviewer #1 about the impact of movement related artefacts in general on our findings.

      First, the assumption the Reviewer makes here about the distribution of errors in this task is incorrect. On average across subjects, 2.32% ± 1.48% (mean ± SD) of all keypresses performed were errors, which were evenly distributed across the four possible keypress responses. While errors increased progressively over practice trials, they did so in proportion to the increase in correct keypresses, so that the overall ratio of correct-to-incorrect keypresses remained stable over the training session. Thus, the Reviewer’s assumptions that there is a higher relative frequency of errors in early trials, and a resulting systematic trend phase shift differences between the visual display updates (i.e. – a change in asterisk position above the displayed sequence) and the keypress performed is not substantiated by the data. To the contrary, the asterisk position on the display and the keypress being executed remained highly correlated over the entire training session. We now include a statement about the frequency and distribution of errors in the revised manuscript.

      Given this high correlation, we firmly agree with the Reviewer that the issue of eye movement-related artefacts is still an important one to address. Fortunately, we did collect eye movement data during the MEG recordings so were able to investigate this. As detailed in the response to Reviewer #1 above, we found that gaze positions and eye-movement velocity time-locked to visual display updates (i.e. – a change in asterisk position above the displayed sequence) did not reflect the asterisk location above chance levels (Overall cross-validated accuracy = 0.21817; see Author response image 1). Furthermore, an inspection of the eye position data revealed that a majority of participants on most trials displayed random walk gaze patterns around a center fixation point, indicating that participants did not attend to the asterisk position on the display. This is consistent with intrinsic generation of the action sequence, and congruent with the fact that the display does not provide explicit feedback related to performance. As pointed out above, a similar real-world example would be manually inputting a long password into a secure online application. In this case, one intrinsically generates the sequence from memory and receives similar feedback about the password sequence position (also provided as asterisks), which is typically ignored by the user. Notably, the minimal participant engagement with the visual task display observed in this study highlights an important difference between behavior observed during explicit sequence learning motor tasks (which is highly generative in nature) with reactive responses to stimulus cues in a serial reaction time task (SRTT).  This is a crucial difference that must be carefully considered when comparing findings across studies. All elements pertaining to this new control analysis are now included in the revised manuscript.

      The authors report a significant correlation between "offline differentiation" and cumulative micro-offline gains. However, it would be more informative to correlate trial-by-trial changes in each of the two variables. This would address the question of whether there is a trial-by-trial relation between the degree of "contextualization" and the amount of micro-offline gains - are performance changes (micro-offline gains) less pronounced across rest periods for which the change in "contextualization" is relatively low? Furthermore, is the relationship between micro-offline gains and "offline differentiation" significantly stronger than the relationship between micro-offline gains and "online differentiation"? 

      In response to a similar issue raised above by Reviewer #2, we now include new analyses comparing correlation magnitudes between (1) “online differention” vs micro-online gains, (2) “online differention” vs micro-offline gains and (3) “offline differentiation” and micro-offline gains (see Author response images 4, 5 and 6 above). These new analyses and results have been added to the revised manuscript. Once again, we thank both Reviewers for this suggestion.

      The authors follow the assumption that micro-offline gains reflect offline learning.

      This statement is incorrect. The original Bonstrup et al (2019) 49 paper clearly states that micro-offline gains must be carefully interpreted based upon the behavioral context within which they are observed, and lays out the conditions under which one can have confidence that micro-offline gains reflect offline learning.  In fact, the excellent meta-analysis of Pan & Rickard (2015) 51, which re-interprets the benefits of sleep in overnight skill consolidation from a “reactive inhibition” perspective, was a crucial resource in the experimental design of our initial study49, as well as in all our subsequent work. Pan & Rickard stated:

      “Empirically, reactive inhibition refers to performance worsening that can accumulate during a period of continuous training (Hull, 1943). It tends to dissipate, at least in part, when brief breaks are inserted between blocks of training. If there are multiple performance-break cycles over a training session, as in the motor sequence literature, performance can exhibit a scalloped effect, worsening during each uninterrupted performance block but improving across blocks52,53. Rickard, Cai, Rieth, Jones, and Ard (2008) and Brawn, Fenn, Nusbaum, and Margoliash (2010) 52,53 demonstrated highly robust scalloped reactive inhibition effects using the commonly employed 30 s–30 s performance break cycle, as shown for Rickard et al.’s (2008) massed practice sleep group in Figure 2. The scalloped effect is evident for that group after the first few 30 s blocks of each session. The absence of the scalloped effect during the first few blocks of training in the massed group suggests that rapid learning during that period masks any reactive inhibition effect.”

      Crucially, Pan & Rickard51 made several concrete recommendations for reducing the impact of the reactive inhibition confound on offline learning studies. One of these recommendations was to reduce practice times to 10s (most prior sequence learning studies up until that point had employed 30s long practice trials). They stated:

      “The traditional design involving 30 s-30 s performance break cycles should be abandoned given the evidence that it results in a reactive inhibition confound, and alternative designs with reduced performance duration per block used instead 51. One promising possibility is to switch to 10 s performance durations for each performance-break cycle Instead 51. That design appears sufficient to eliminate at least the majority of the reactive inhibition effect 52,53.”

      We mindfully incorporated recommendations from Pan and Rickard51  into our own study designs including 1) utilizing 10s practice trials and 2) constraining our analysis of micro-offline gains to early learning trials (where performance monotonically increases and 95% of overall performance gains occur), which are prior to the emergence of the “scalloped” performance dynamics that are strongly linked to reactive inhibition effects. 

      However, there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.

      We strongly disagree with the Reviewer’s assertion that “there is no direct evidence in the literature that micro-offline gains really result from offline learning, i.e., an improvement in skill level.”  The initial Bönstrup et al. (2019) 49 report was followed up by a large online crowd-sourcing study (Bönstrup et al., 2020) 54. This second (and much larger) study provided several additional important findings supporting our interpretation of micro-offline gains in cases where the important behavioral conditions clarified above were met (see Author response image 7 below for further details on these conditions).

      Author response image 7.

      Micro-offline gains observed in learning and non-learning contexts are attributed to different underlying causes. (A) Micro-offline and online changes relative to overall trial-by-trial learning. This figure is based on data from Bönstrup et al. (2019) 49. During early learning, micro-offline gains (red bars) closely track trial-by-trial performance gains (green line with open circle markers), with minimal contribution from micro-online gains (blue bars). The stated conclusion in Bönstrup et al. (2019) is that micro-offline gains only during this Early Learning stage reflect rapid memory consolidation (see also 54). After early learning, about practice trial 11, skill plateaus. This plateau skill period is characterized by a striking emergence of coupled (and relatively stable) micro-online drops and micro-offline increases. Bönstrup et al. (2019) as well as others in the literature 55-57, argue that micro-offline gains during the plateau period likely reflect recovery from inhibitory performance factors such as reactive inhibition or fatigue, and thus must be excluded from analyses relating micro-offline gains to skill learning.  The Non-repeating groups in Experiments 3 and 4 from Das et al. (2024) suffer from a lack of consideration of these known confounds.

      Evidence documented in that paper54 showed that micro-offline gains during early skill learning were: 1) replicable and generalized to subjects learning the task in their daily living environment (n=389); 2) equivalent when significantly shortening practice period duration, thus confirming that they are not a result of recovery from performance fatigue (n=118);  3) reduced (along with learning rates) by retroactive interference applied immediately after each practice period relative to interference applied after passage of time (n=373), indicating stabilization of the motor memory at a microscale of several seconds consistent with rapid consolidation; and 4) not modified by random termination of the practice periods, ruling out a contribution of predictive motor slowing (N = 71) 54.  Altogether, our findings were strongly consistent with the interpretation that micro-offline gains reflect memory consolidation supporting early skill learning. This is precisely the portion of the learning curve Pan and Rickard51 refer to when they state “…rapid learning during that period masks any reactive inhibition effect”.

      This interpretation is further supported by brain imaging evidence linking known memory-related networks and consolidation mechanisms to micro-offline gains. First, we reported that the density of fast hippocampo-neocortical skill memory replay events increases approximately three-fold during early learning inter-practice rest periods with the density explaining differences in the magnitude of micro-offline gains across subjects1. Second, Jacobacci et al. (2020) independently reproduced our original behavioral findings and reported BOLD fMRI changes in the hippocampus and precuneus (regions also identified in our MEG study1) linked to micro-offline gains during early skill learning. 33 These functional changes were coupled with rapid alterations in brain microstructure in the order of minutes, suggesting that the same network that operates during rest periods of early learning undergoes structural plasticity over several minutes following practice58. Third, even more recently, Chen et al. (2024) provided direct evidence from intracranial EEG in humans linking sharp-wave ripple events (which are known markers for neural replay59) in the hippocampus (80-120 Hz in humans) with micro-offline gains during early skill learning. The authors report that the strong increase in ripple rates tracked learning behavior, both across blocks and across participants. The authors conclude that hippocampal ripples during resting offline periods contribute to motor sequence learning. 2

      Thus, there is actually now substantial evidence in the literature directly supporting the assertion “that micro-offline gains really result from offline learning”.  On the contrary, according to Gupta & Rickard (2024) “…the mechanism underlying RI [reactive inhibition] is not well established” after over 80 years of investigation60, possibly due to the fact that “reactive inhibition” is a categorical description of behavioral effects that likely result from several heterogenous processes with very different underlying mechanisms.

      On the contrary, recent evidence questions this interpretation (Gupta & Rickard, npj Sci Learn 2022; Gupta & Rickard, Sci Rep 2024; Das et al., bioRxiv 2024). Instead, there is evidence that micro-offline gains are transient performance benefits that emerge when participants train with breaks, compared to participants who train without breaks, however, these benefits vanish within seconds after training if both groups of participants perform under comparable conditions (Das et al., bioRxiv 2024). 

      It is important to point out that the recent work of Gupta & Rickard (2022,2024) 55 does not present any data that directly opposes our finding that early skill learning49 is expressed as micro-offline gains during rest breaks. These studies are essentially an extension of the Rickard et al (2008) paper that employed a massed (30s practice followed by 30s breaks) vs spaced (10s practice followed by 10s breaks) to assess if recovery from reactive inhibition effects could account for performance gains measured after several minutes or hours. Gupta & Rickard (2022) added two additional groups (30s practice/10s break and 10s practice/10s break as used in the work from our group). The primary aim of the study was to assess whether it was more likely that changes in performance when retested 5 minutes after skill training (consisting of 12 practice trials for the massed groups and 36 practice trials for the spaced groups) had ended reflected memory consolidation effects or recovery from reactive inhibition effects. The Gupta & Rickard (2024) follow-up paper employed a similar design with the primary difference being that participants performed a fixed number of sequences on each trial as opposed to trials lasting a fixed duration. This was done to facilitate the fitting of a quantitative statistical model to the data.  To reiterate, neither study included any analysis of micro-online or micro-offline gains and did not include any comparison focused on skill gains during early learning. Instead, Gupta & Rickard (2022), reported evidence for reactive inhibition effects for all groups over much longer training periods. Again, we reported the same finding for trials following the early learning period in our original Bönstrup et al. (2019) paper49 (Author response image 7). Also, please note that we reported in this paper that cumulative micro-offline gains over early learning did not correlate with overnight offline consolidation measured 24 hours later49 (see the Results section and further elaboration in the Discussion). Thus, while the composition of our data is supportive of a short-term memory consolidation process operating over several seconds during early learning, it likely differs from those involved over longer training times and offline periods, as assessed by Gupta & Rickard (2022).

      In the recent preprint from Das et al (2024) 61,  the authors make the strong claim that “micro-offline gains during early learning do not reflect offline learning” which is not supported by their own data.   The authors hypothesize that if “micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”.  The study utilizes a spaced vs. massed practice group between-subjects design inspired by the reactive inhibition work from Rickard and others to test this hypothesis. Crucially, the design incorporates only a small fraction of the training used in other investigations to evaluate early skill learning1,33,49,54,57,58,62.  A direct comparison between the practice schedule designs for the spaced and massed groups in Das et al., and the training schedule all participants experienced in the original Bönstrup et al. (2019) paper highlights this issue as well as several others (Author response image 8):

      Author response image 8.

      (A) Comparison of Das et al. Spaced & Massed group training session designs, and the training session design from the original Bönstrup et al. (2019) 49 paper. Similar to the approach taken by Das et al., all practice is visualized as 10-second practice trials with a variable number (either 0, 1 or 30) of 10-second-long inter-practice rest intervals to allow for direct comparisons between designs. The two key takeaways from this comparison are that (1) the intervention differences (i.e. – practice schedules) between the Massed and Spaced groups from the Das et al. report are extremely small (less than 12% of the overall session schedule) and (2) the overall amount of practice is much less than compared to the design from the original Bönstrup report 49  (which has been utilized in several subsequent studies). (B) Group-level learning curve data from Bönstrup et al. (2019) 49 is used to estimate the performance range accounted for by the equivalent periods covering Test 1, Training 1 and Test 2 from Das et al (2024). Note that the intervention in the Das et al. study is limited to a period covering less than 50% of the overall learning range.

      First, participants in the original Bönstrup et al. study 49 experienced 157.14% more practice time and 46.97% less inter-practice rest time than the Spaced group in the Das et al. study (Author response image 8).  Thus, the overall amount of practice and rest differ substantially between studies, with much more limited training occurring for participants in Das et al.  

      Second, and perhaps most importantly, the actual intervention (i.e. – the difference in practice schedule between the Spaced and Massed groups) employed by Das et al. covers a very small fraction of the overall training session. Identical practice schedule segments for both the Spaced & Massed groups are indicated by the red shaded area in Author response image 8. Please note that these identical segments cover 94.84% of the Massed group training schedule and 88.01% of the Spaced group training schedule (since it has 60 seconds of additional rest). This means that the actual interventions cover less than 5% (for Massed) and 12% (for Spaced) of the total training session, which minimizes any chance of observing a difference between groups.

      Also note that the very beginning of the practice schedule (during which Figure R9 shows substantial learning is known to occur) is labeled in the Das et al. study as Test 1.  Test 1 encompasses the first 20 seconds of practice (alternatively viewed as the first two 10-second-long practice trials with no inter-practice rest). This is immediately followed by the Training 1 intervention, which is composed of only three 10-second-long practice trials (with 10-second inter-practice rest for the Spaced group and no inter-practice rest for the Massed group). Author response image 8 also shows that since there is no inter-practice rest after the third Training practice trial for the Spaced group, this third trial (for both Training 1 and 2) is actually a part of an identical practice schedule segment shared by both groups (Massed and Spaced), reducing the magnitude of the intervention even further.

      Moreover, we know from the original Bönstrup et al. (2019) paper49 that 46.57% of all overall group-level performance gains occurred between trials 2 and 5 for that study. Thus, Das et al. are limiting their designed intervention to a period covering less than half of the early learning range discussed in the literature, which again, minimizes any chance of observing an effect.

      This issue is amplified even further at Training 2 since skill learning prior to the long 5-minute break is retained, further constraining the performance range over these three trials. A related issue pertains to the trials labeled as Test 1 (trials 1-2) and Test 2 (trials 6-7) by Das et al. Again, we know from the original Bönstrup et al. paper 49 that 18.06% and 14.43% (32.49% total) of all overall group-level performance gains occurred during trials corresponding to Das et al Test 1 and Test 2, respectively. In other words, Das et al averaged skill performance over 20 seconds of practice at two time-points where dramatic skill improvements occur. Pan & Rickard (1995) previously showed that such averaging is known to inject artefacts into analyses of performance gains.

      Furthermore, the structure of the Test in Das et. al study appears to have an interference effect on the Spaced group performance after the training intervention.  This makes sense if you consider that the Spaced group is required to now perform the task in a Massed practice environment (i.e., two 10-second-long practice trials merged into one long trial), further blurring the true intervention effects. This effect is observable in Figure 1C,E of their pre-print. Specifically, while the Massed group continues to show an increase in performance during test relative to the last 10 seconds of practice during training, the Spaced group displays a marked decrease. This decrease is in stark contrast to the monotonic increases observed for both groups at all other time-points.

      Interestingly, when statistical comparisons between the groups are made at the time-points when the intervention is present (as opposed to after it has been removed) then the stated hypothesis, “If micro-offline gains represent offline learning, participants should reach higher skill levels when training with breaks, compared to training without breaks”, is confirmed.

      The data presented by Gupta and Rickard (2022, 2024) and Das et al. (2024) is in many ways more confirmatory of the constraints employed by our group and others with respect to experimental design, analysis and interpretation of study findings, rather than contradictory. Still, it does highlight a limitation of the current micro-online/offline framework, which was originally only intended to be applied to early skill learning over spaced practice schedules when reactive inhibition effects are minimized49. Extrapolation of this current framework to post-plateau performance periods, longer timespans, or non-learning situations (e.g. – the Non-repeating groups from Experiments 3 & 4 in Das et al. (2024)), when reactive inhibition plays a more substantive role, is not warranted. Ultimately, it will be important to develop new paradigms allowing one to independently estimate the different coincident or antagonistic features (e.g. - memory consolidation, planning, working memory and reactive inhibition) contributing to micro-online and micro-offline gains during and after early skill learning within a unifying framework.

      References

      (1) Buch, E. R., Claudino, L., Quentin, R., Bonstrup, M. & Cohen, L. G. Consolidation of human skill linked to waking hippocampo-neocortical replay. Cell Rep 35, 109193 (2021). https://doi.org:10.1016/j.celrep.2021.109193

      (2) Chen, P.-C., Stritzelberger, J., Walther, K., Hamer, H. & Staresina, B. P. Hippocampal ripples during offline periods predict human motor sequence learning. bioRxiv, 2024.2010.2006.614680 (2024). https://doi.org:10.1101/2024.10.06.614680

      (3) Classen, J., Liepert, J., Wise, S. P., Hallett, M. & Cohen, L. G. Rapid plasticity of human cortical movement representation induced by practice. J Neurophysiol 79, 1117-1123 (1998).

      (4) Karni, A. et al. Functional MRI evidence for adult motor cortex plasticity during motor skill learning. Nature 377, 155-158 (1995). https://doi.org:10.1038/377155a0

      (5) Kleim, J. A., Barbay, S. & Nudo, R. J. Functional reorganization of the rat motor cortex following motor skill learning. J Neurophysiol 80, 3321-3325 (1998).

      (6) Shadmehr, R. & Holcomb, H. H. Neural correlates of motor memory consolidation. Science 277, 821-824 (1997).

      (7) Doyon, J. et al. Experience-dependent changes in cerebellar contributions to motor sequence learning. Proc Natl Acad Sci U S A 99, 1017-1022 (2002).

      (8) Toni, I., Ramnani, N., Josephs, O., Ashburner, J. & Passingham, R. E. Learning arbitrary visuomotor associations: temporal dynamic of brain activity. Neuroimage 14, 1048-1057 (2001).

      (9) Grafton, S. T. et al. Functional anatomy of human procedural learning determined with regional cerebral blood flow and PET. J Neurosci 12, 2542-2548 (1992).

      (10) Kennerley, S. W., Sakai, K. & Rushworth, M. F. Organization of action sequences and the role of the pre-SMA. J Neurophysiol 91, 978-993 (2004). https://doi.org:10.1152/jn.00651.2003 00651.2003 [pii]

      (11) Hardwick, R. M., Rottschy, C., Miall, R. C. & Eickhoff, S. B. A quantitative meta-analysis and review of motor learning in the human brain. Neuroimage 67, 283-297 (2013). https://doi.org:10.1016/j.neuroimage.2012.11.020

      (12) Sawamura, D. et al. Acquisition of chopstick-operation skills with the non-dominant hand and concomitant changes in brain activity. Sci Rep 9, 20397 (2019). https://doi.org:10.1038/s41598-019-56956-0

      (13) Lee, S. H., Jin, S. H. & An, J. The difference in cortical activation pattern for complex motor skills: A functional near- infrared spectroscopy study. Sci Rep 9, 14066 (2019). https://doi.org:10.1038/s41598-019-50644-9

      (14) Battaglia-Mayer, A. & Caminiti, R. Corticocortical Systems Underlying High-Order Motor Control. J Neurosci 39, 4404-4421 (2019). https://doi.org:10.1523/JNEUROSCI.2094-18.2019

      (15) Toni, I., Thoenissen, D. & Zilles, K. Movement preparation and motor intention. Neuroimage 14, S110-117 (2001). https://doi.org:10.1006/nimg.2001.0841

      (16) Wolpert, D. M., Goodbody, S. J. & Husain, M. Maintaining internal representations: the role of the human superior parietal lobe. Nat Neurosci 1, 529-533 (1998). https://doi.org:10.1038/2245

      (17) Andersen, R. A. & Buneo, C. A. Intentional maps in posterior parietal cortex. Annu Rev Neurosci 25, 189-220 (2002). https://doi.org:10.1146/annurev.neuro.25.112701.142922 112701.142922 [pii]

      (18) Buneo, C. A. & Andersen, R. A. The posterior parietal cortex: sensorimotor interface for the planning and online control of visually guided movements. Neuropsychologia 44, 2594-2606 (2006). https://doi.org:S0028-3932(05)00333-7 [pii] 10.1016/j.neuropsychologia.2005.10.011

      (19) Grover, S., Wen, W., Viswanathan, V., Gill, C. T. & Reinhart, R. M. G. Long-lasting, dissociable improvements in working memory and long-term memory in older adults with repetitive neuromodulation. Nat Neurosci 25, 1237-1246 (2022). https://doi.org:10.1038/s41593-022-01132-3

      (20) Colclough, G. L. et al. How reliable are MEG resting-state connectivity metrics? Neuroimage 138, 284-293 (2016). https://doi.org:10.1016/j.neuroimage.2016.05.070

      (21) Colclough, G. L., Brookes, M. J., Smith, S. M. & Woolrich, M. W. A symmetric multivariate leakage correction for MEG connectomes. NeuroImage 117, 439-448 (2015). https://doi.org:10.1016/j.neuroimage.2015.03.071

      (22) Mollazadeh, M. et al. Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movements. J Neurosci 31, 15531-15543 (2011). https://doi.org:10.1523/JNEUROSCI.2999-11.2011

      (23) Bansal, A. K., Vargas-Irwin, C. E., Truccolo, W. & Donoghue, J. P. Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices. J Neurophysiol 105, 1603-1619 (2011). https://doi.org:10.1152/jn.00532.2010

      (24) Flint, R. D., Ethier, C., Oby, E. R., Miller, L. E. & Slutzky, M. W. Local field potentials allow accurate decoding of muscle activity. J Neurophysiol 108, 18-24 (2012). https://doi.org:10.1152/jn.00832.2011

      (25) Churchland, M. M. et al. Neural population dynamics during reaching. Nature 487, 51-56 (2012). https://doi.org:10.1038/nature11129

      (26) Bassett, D. S. et al. Dynamic reconfiguration of human brain networks during learning. Proc Natl Acad Sci U S A 108, 7641-7646 (2011). https://doi.org:10.1073/pnas.1018985108

      (27) Albouy, G., King, B. R., Maquet, P. & Doyon, J. Hippocampus and striatum: dynamics and interaction during acquisition and sleep-related motor sequence memory consolidation. Hippocampus 23, 985-1004 (2013). https://doi.org:10.1002/hipo.22183

      (28) Albouy, G. et al. Neural correlates of performance variability during motor sequence acquisition. Neuroimage 60, 324-331 (2012). https://doi.org:10.1016/j.neuroimage.2011.12.049

      (29) Qin, Y. L., McNaughton, B. L., Skaggs, W. E. & Barnes, C. A. Memory reprocessing in corticocortical and hippocampocortical neuronal ensembles. Philos Trans R Soc Lond B Biol Sci 352, 1525-1533 (1997). https://doi.org:10.1098/rstb.1997.0139

      (30) Euston, D. R., Tatsuno, M. & McNaughton, B. L. Fast-forward playback of recent memory sequences in prefrontal cortex during sleep. Science 318, 1147-1150 (2007). https://doi.org:10.1126/science.1148979

      (31) Molle, M. & Born, J. Hippocampus whispering in deep sleep to prefrontal cortex--for good memories? Neuron 61, 496-498 (2009). https://doi.org:S0896-6273(09)00122-6 [pii] 10.1016/j.neuron.2009.02.002

      (32) Frankland, P. W. & Bontempi, B. The organization of recent and remote memories. Nat Rev Neurosci 6, 119-130 (2005). https://doi.org:10.1038/nrn1607

      (33) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proc Natl Acad Sci U S A 117, 23898-23903 (2020). https://doi.org:10.1073/pnas.2009576117

      (34) Albouy, G. et al. Maintaining vs. enhancing motor sequence memories: respective roles of striatal and hippocampal systems. Neuroimage 108, 423-434 (2015). https://doi.org:10.1016/j.neuroimage.2014.12.049

      (35) Gais, S. et al. Sleep transforms the cerebral trace of declarative memories. Proc Natl Acad Sci U S A 104, 18778-18783 (2007). https://doi.org:0705454104 [pii] 10.1073/pnas.0705454104

      (36) Sterpenich, V. et al. Sleep promotes the neural reorganization of remote emotional memory. J Neurosci 29, 5143-5152 (2009). https://doi.org:10.1523/JNEUROSCI.0561-09.2009

      (37) Euston, D. R., Gruber, A. J. & McNaughton, B. L. The role of medial prefrontal cortex in memory and decision making. Neuron 76, 1057-1070 (2012). https://doi.org:10.1016/j.neuron.2012.12.002

      (38) van Kesteren, M. T., Fernandez, G., Norris, D. G. & Hermans, E. J. Persistent schema-dependent hippocampal-neocortical connectivity during memory encoding and postencoding rest in humans. Proc Natl Acad Sci U S A 107, 7550-7555 (2010). https://doi.org:10.1073/pnas.0914892107

      (39) van Kesteren, M. T., Ruiter, D. J., Fernandez, G. & Henson, R. N. How schema and novelty augment memory formation. Trends Neurosci 35, 211-219 (2012). https://doi.org:10.1016/j.tins.2012.02.001

      (40) Wagner, A. D. et al. Building memories: remembering and forgetting of verbal experiences as predicted by brain activity. Science (New York, N.Y.) 281, 1188-1191 (1998).

      (41) Ashe, J., Lungu, O. V., Basford, A. T. & Lu, X. Cortical control of motor sequences. Curr Opin Neurobiol 16, 213-221 (2006).

      (42) Hikosaka, O., Nakamura, K., Sakai, K. & Nakahara, H. Central mechanisms of motor skill learning. Curr Opin Neurobiol 12, 217-222 (2002).

      (43) Penhune, V. B. & Steele, C. J. Parallel contributions of cerebellar, striatal and M1 mechanisms to motor sequence learning. Behav. Brain Res. 226, 579-591 (2012). https://doi.org:10.1016/j.bbr.2011.09.044

      (44) Doyon, J. et al. Contributions of the basal ganglia and functionally related brain structures to motor learning. Behavioural brain research 199, 61-75 (2009). https://doi.org:10.1016/j.bbr.2008.11.012

      (45) Schendan, H. E., Searl, M. M., Melrose, R. J. & Stern, C. E. An FMRI study of the role of the medial temporal lobe in implicit and explicit sequence learning. Neuron 37, 1013-1025 (2003). https://doi.org:10.1016/s0896-6273(03)00123-5

      (46) Morris, R. G. M. Elements of a neurobiological theory of hippocampal function: the role of synaptic plasticity, synaptic tagging and schemas. The European journal of neuroscience 23, 2829-2846 (2006). https://doi.org:10.1111/j.1460-9568.2006.04888.x

      (47) Tse, D. et al. Schemas and memory consolidation. Science 316, 76-82 (2007). https://doi.org:10.1126/science.1135935

      (48) Berlot, E., Popp, N. J. & Diedrichsen, J. A critical re-evaluation of fMRI signatures of motor sequence learning. Elife 9 (2020). https://doi.org:10.7554/eLife.55241

      (49) Bonstrup, M. et al. A Rapid Form of Offline Consolidation in Skill Learning. Curr Biol 29, 1346-1351 e1344 (2019). https://doi.org:10.1016/j.cub.2019.02.049

      (50) Kornysheva, K. et al. Neural Competitive Queuing of Ordinal Structure Underlies Skilled Sequential Action. Neuron 101, 1166-1180 e1163 (2019). https://doi.org:10.1016/j.neuron.2019.01.018

      (51) Pan, S. C. & Rickard, T. C. Sleep and motor learning: Is there room for consolidation? Psychol Bull 141, 812-834 (2015). https://doi.org:10.1037/bul0000009

      (52) Rickard, T. C., Cai, D. J., Rieth, C. A., Jones, J. & Ard, M. C. Sleep does not enhance motor sequence learning. J Exp Psychol Learn Mem Cogn 34, 834-842 (2008). https://doi.org:10.1037/0278-7393.34.4.834

      53) Brawn, T. P., Fenn, K. M., Nusbaum, H. C. & Margoliash, D. Consolidating the effects of waking and sleep on motor-sequence learning. J Neurosci 30, 13977-13982 (2010). https://doi.org:10.1523/JNEUROSCI.3295-10.2010

      (54) Bonstrup, M., Iturrate, I., Hebart, M. N., Censor, N. & Cohen, L. G. Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced data. NPJ Sci Learn 5, 7 (2020). https://doi.org:10.1038/s41539-020-0066-9

      (55) Gupta, M. W. & Rickard, T. C. Dissipation of reactive inhibition is sufficient to explain post-rest improvements in motor sequence learning. NPJ Sci Learn 7, 25 (2022). https://doi.org:10.1038/s41539-022-00140-z

      (56) Jacobacci, F. et al. Rapid hippocampal plasticity supports motor sequence learning. Proceedings of the National Academy of Sciences 117, 23898-23903 (2020).

      (57) Brooks, E., Wallis, S., Hendrikse, J. & Coxon, J. Micro-consolidation occurs when learning an implicit motor sequence, but is not influenced by HIIT exercise. NPJ Sci Learn 9, 23 (2024). https://doi.org:10.1038/s41539-024-00238-6

      (58) Deleglise, A. et al. Human motor sequence learning drives transient changes in network topology and hippocampal connectivity early during memory consolidation. Cereb Cortex 33, 6120-6131 (2023). https://doi.org:10.1093/cercor/bhac489

      (59) Buzsaki, G. Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planning. Hippocampus 25, 1073-1188 (2015). https://doi.org:10.1002/hipo.22488

      (60) Gupta, M. W. & Rickard, T. C. Comparison of online, offline, and hybrid hypotheses of motor sequence learning using a quantitative model that incorporate reactive inhibition. Sci Rep 14, 4661 (2024). https://doi.org:10.1038/s41598-024-52726-9

      (61) Das, A., Karagiorgis, A., Diedrichsen, J., Stenner, M.-P. & Azanon, E. “Micro-offline gains” convey no benefit for motor skill learning. bioRxiv, 2024.2007.2011.602795 (2024). https://doi.org:10.1101/2024.07.11.602795

      (62) Mylonas, D. et al. Maintenance of Procedural Motor Memory across Brief Rest Periods Requires the Hippocampus. J Neurosci 44 (2024). https://doi.org:10.1523/JNEUROSCI.1839-23.2024

  2. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. taught me to see them as complex individuals who all wanted an education, and having learned these lessons from my students, I can't close my eyes to the fact that many of them do not attend college-something that is taken for granted by many of their even slightly wealthier peers. Thanks to my years of teaching in low-income schools, and thanks to my student teachers, my eyes are wide open to this disparity. I am gathering my strength and planning my agenda for the next chapter in my career: Get those truly left behind ready and into college. I have 20+ more years of work until retirement. Wish me luck. Or join me.

      Ungemah realized that the most significant lesson she learned from her students was how to confront inequality head-on. They helped her understand the structural issues within education: despite their efforts, many students from disadvantaged backgrounds remain systematically excluded from higher education. A teacher's awakening stems not only from professional training but also from the shared reality of education experienced alongside students. Students are not merely learners; they are also the ones who reveal the truth about the education system.

    1. Head CT scan without contrast, no intracranial hemorrhage or any fracture. CTcervical without contrast. No fracture

      Tone and Style: The tone and style are technical, as the document is composed of incomplete sentences, with non-emotional facts, and no obvious errors. The sentences are short, allowing for efficient reading of the document, including only the necessary information.

    1. Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile but females were fertile. Slc35g3-null males produced normal sperm count but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number sperm proteins in testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired its transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations needed to be revised. These have been adequately addressed in the revised manuscript.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In the present manuscript, Mashiko and colleagues describe a novel phenotype associated with deficient SLC35G3, a testis-specific sugar transporter that is important in glycosylation of key proteins in sperm function. The study characterizes a knockout mouse for this gene and the multifaceted male infertility that ensues. The manuscript is well-written and describes novel physiology through a broad set of appropriate assays.

      Strengths:

      Robust analysis with detailed functional and molecular assays

      Weaknesses:

      (1) The abstract references reported mutations in human SLC35G3, but this is not discussed or correlated to the murine findings to a sufficient degree in the manuscript. The HEK293T experiments are reasonable and add value, but a more detailed discussion of the clinical phenotype of the known mutations in this gene and whether they are recapitulated in this study (or not) would be beneficial.

      Since no patients have been identified, our experiment was conducted to investigate the activity of the mutation found in humans.

      (2) Can the authors expand on how this mutation causes such a wide array of phenotypic defects? I am surprised there is a morphological defect, a fertilization defect, and a transit defect. Do the authors believe all of these are present in humans as well?

      Thank you for your comment. There are many glycoprotein-coding genes that influence sperm head morphology, fertilization defect, and transit defect have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm. 

      Reviewer #2 (Public review):

      Summary:

      This study characterized the function of SLC35G3, a putative transmembrane UDP-N-acetylglucosamine transporter, in spermatogenesis. They showed that SLC35G3 is testis-specific and expressed in round spermatids. Slc35g3-null males were sterile, but females were fertile. Slc35g3-null males produced a normal sperm count, but sperm showed subtle head morphology. Sperm from Slc35g3-null males have defects in uterotubal junction passage, ZP binding, and oocyte fusion. Loss of SLC35G3 causes abnormal processing and glycosylation of a number of sperm proteins in the testis and sperm. They demonstrated that SLC35G3 functions as a UDP-GlcNAc transporter in cell lines. Two human SLC35G3 variants impaired their transporter activity, implicating these variants in human infertility.

      Strengths:

      This study is thorough. The mutant phenotype is strong and interesting. The major conclusions are supported by the data. This study demonstrated SLC35G3 as a new and essential factor for male fertility in mice, which is likely conserved in humans.

      Weaknesses:

      Some data interpretations need to be revised.

      Thank you for comments. We revised interpretations.

      Reviewer #1 (Recommendations for the authors):

      (1) The introduction could be structured more efficiently. Much of what is discussed in the first paragraph appears to be redundant to the second paragraph (or perhaps unrelated to the present manuscript).

      In the Introduction, we described the process of glycoprotein formation, 1) quality control or nascent glycoproteins in the ER and its relations importance in sperm fertilizing ability, 2) glycan maturation in the Golgi apparatus and its importance in sperm fertilizing ability, and 3) the supply of nucleotide sugars as the basis of these processes. 

      We would like to retain this structure in the revised manuscript and appreciate your understanding.

      (2) Given the significant difference in morphology between murine and human sperm, can the authors comment on whether these findings are directly translatable to humans?

      Thank you for your comment. There are significant differences in sperm morphology between mice and humans, but many glycoprotein-coding genes that influence sperm head morphology have been identified in knockout mouse studies, and most of these are conserved in humans. Therefore, we believe that glycan modification by SLC35G3 is also involved in the regulation of human sperm head morphology. Observing sperm samples from individuals with SLC35G3 mutations is the most direct approach to verify this point and is considered an important goal for future research. The following text has been added to clarify the point:

      New Line 338; While these proteins are also found in humans, it is still too early to infer the importance of SLC35G3 in the morphogenesis of human sperm heads. Observing sperm samples from individuals with SLC35G3 mutations would be the most direct approach to address this, and we consider it an important objective for future studies.

      (3) Line 194 - while the inability to pass the UTJ may indeed be a component of this infertility phenotype, I would argue that a complete lack of ability to fertilize (even with IVF but not ICSI) suggests that the primary defect is elsewhere. This statement should be removed, and the topic of these two separate mechanisms should be compared/contrasted in the discussion.

      We agree that this is an overstatement, so we changed it;

      New line 187; Thus, the defective UTJ migration is one of the primary causes of Slc35g3-/- male infertility. 

      We believe the current statement in the discussion can stay as it is. 

      Line 379; We reaffirmed that glycosylation-related genes specific to the testis play a crucial role in the synthesis, quality control, and function of glycoproteins on sperm, which are essential for male fertility through their interactions with eggs and the female reproductive system.

      (4) Did the authors consider performing TEM to assess the sperm ultrastructure and the acrosome?

      Since morphological abnormalities were evident even at the macro level, TEM was not performed in this study. In the future, we plan to use immune-TEM against affected/non-affected glycoproteins when the antibodies become available.

      (5) I would argue that Figure 3 should not be labeled as "essential", given the abnormal sperm head morphology compared to humans, the relatively modest difference between the groups on PCA, and more broadly speaking, the relatively poor correlation with morphology and human male infertility. While globozoospermia is clearly an exception, the data in this figure may not translate to human sperm and/or may not be clinically relevant even if it does.

      Indeed, other KO spermatozoa with similar morphological features are known to cause a reduction in litter size but do not result in complete infertility. As discussed in line 1, this head shape is not essential for fertilization. Reviewer 2 also pointed out that the phrase "Slc35g3 is essential for sperm head formation" is too strong; therefore, we would like to revise Fig3 title to "Slc35g3 is involved in the regulation of sperm head morphology."

      (6) Have the authors generated slc35b4 KO mice?

      No, we did not. Since Slc35b4 is expressed throughout the body, a straight knockout may affect other organs or developmental processes. To investigate its role specifically in the testis, it will be necessary to generate a conditional knockout (cKO) model. As this requires considerable cost, time, and labor, we would like to leave it for future investigation.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 122-123: "it is prominently expressed in the testis, beginning 21 days postpartum (Figure 1B), suggesting expression from the secondary spermatocyte stage to the round spermatid stage in mice." Day 21 indicates the first appearance of round spermatids, but not secondary spermatocytes. Please change to the following: ...suggesting that its expression begins in round spermatids in mice.

      I agree with your comment and have revised the text accordingly (New line 114).

      (2) Figure 1E: What germ cells are they? The type of germ cells needs to be labelled on the image. Double staining with a germ cell marker would be helpful to distinguish germ cells from testicular somatic cells.

      Thank you for your comment. We replaced the Figure 1E as follows.

      To distinguish germ cells from testicular somatic cells, we used the germ cell marker TRA98 antibody. Furthermore, based on the nuclear and GM130 staining pattern, we consider that the Golgi apparatus of round spermatids is labeled.

      (3) Figure 2C: The most abundant WB band is between 20 and 25 kD and is non-specific. Does the arrow point to the expected SLC35G3 band? There are two minor bands above the main non-specific band. Are both bands specific to SLC35G3? Given the strong non-specific band on WB, how specific is the immunofluorescence signal produced by this antibody? These need to be explained and discussed.

      The arrow pointed to the expected size (35kDa).

      We thought that these non-specific bands could be due to blood contamination, so we retried with testicular germ cells. We confirmed that non-specific bands disappeared in the subsequent Western blot analysis. The specificity of the immunofluorescence signal is supported by its complete absence in the KO, as shown in the Supplementary Figures. We have decided to include this improved dataset. Thank you for your comment, which helped us improve the data.

      Author response image 1.

      (4) Line 184: "Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability, but genomic integrity is intact." Producing viable offspring does not necessarily mean that genomic integrity is intact. Suggestion: Slc35g3-/--derived sperm have defects in ZP binding and oolemma fusion ability but produce viable offspring. Likewise, the Figure S9 caption also needs to be changed.

      Thank you for your constructive comment. We have revised the text as you suggested.

      (5) Figure 3. "Slc35g3 is essential for sperm head formation". This statement is too strong. It is not essential for sperm head formation. The sperm head is still formed, but shows subtle deformation.

      Thank you for your suggestion. We changed as follows:

      FIg.3; ”Slc35g3 is involved in the regulation of sperm head morphology.”

      (6) Lines 204-205: Figure 6B: "Interestingly, some bands of sperm acrosome-associated 1 (SPACA1; 26) disappeared in Slc35g3-/- testis lysates." I don't see the absence of SPACA1 bands in -/- testis. This needs to be clearly labeled with arrows. On the contrary, the bands are stronger in Slc35g3-/- testis lysates.

      Thank you for your comment. After carefully considering your comments, we concluded that using "disappeared" is indeed inappropriate. We would like to revise the sentence as follows: New line 197; "Interestingly, SPACA1 (Sperm Acrosome Associated 1; 26) exhibited a subtle difference in banding pattern in the Slc35g3-/- testis lysate."

    1. The Society for Immunotherapy of Cancer consensus statement on immunotherapy for the treatment of squamous cell carcinoma of the head and neck (HNSCC)

      Last Reviewed 11/15/2024 (v1.1 Update)

      The information on this page provides a detailed overview of updates to the guideline content based on changes in the field. Updates to the guideline outlined below were made with the approval of SITC's Head and Neck Squamous Cell Carcinoma (HNSCC) Guideline Expert Panel. More information on SITC Guidelines can be found at sitcancer.org/guidelines.

      Update v1.1 Summary * The FDA approved pembrolizumab for patients with recurrent or metastatic cutaneous squamous cell carcinoma in June 2020 [Ref 108, 169].

      • The FDA approved toripalimab in combination with cisplatin and gemcitabine for first-line treatment of patients with metastatic or recurrent locally advanced nasopharyngeal carcinoma, or as a monotherapy for treatment of adult patients with recurrent, unresectable, or metastatic nasopharyngeal carcinoma with disease progression on or after platinum-containing chemotherapy in October 2023 [Ref 170, 171].

      • Practice-changing data have been reported from CheckMate 141 regarding nivolumab as a first-line treatment in recurrent or metastatic HNSCC after progressing on platinum therapy for locally advanced disease in the adjuvant or primary (ie, with radiation) setting [Ref 172].

      • Practice-changing data have been reported from KEYNOTE B10 and the FRAIL-IMMUNE/GORTEC 2018-03 trials, demonstrating efficacy of combining an anti-PD-(L)1 immune checkpoint inhibitor (ICI) with carboplatin + paclitaxel in frail patients with R/M HNSCC [Ref 173, 174].

    1. At least four French museums have been robbed in the last two months

      Impact stretches beyond this crime, but the lack of security priceless French treasures over the past several months

    2. The French government will not be compensated for the stolen works of art.

      Impacts the nation both financially and culturally. This line also carries prominence, since it involves the French state and national property...it's not only a private loss.

    3. Beccuau told RTL: “The wrongdoers who took these gems won’t earn €88m

      This quote brings in human interest. It's not only about stolen jewels, but also about greed, destruction and value, which adds a more personal element to the story.

    4. Paris prosecutors have charged a specialised unit known as the BRB with investigating the crime.

      Mentioning an elite police force being brought into the investigation adds a special level of prominence, showing this is a high-stakes investigation over priceless jewels. It also shows impact as well, since the government is taking this matter very seriously at a national level.

    5. Contrary to some reports, it said, the display cases protecting the stolen Napoleonic jewellery had been installed in 2019 and “represented a considerable improvement in terms of security”.

      This adds conflict between the media's narrative and the museum itself. Journalists include this because of the tension and disagreement to make stories more engaging to the reader.

    6. “The Louvre museum’s security apparatus did not fail, that is a fact,”

      The quote from the culture minister is very defensive and repeated. Officials like her influence the media to protect their image through their word choices. Reiterating "that is a fact" further solidifies this idea.

    7. A gang of four thieves forced their way into the Louvre’s Apollo gallery shortly after the museum opened on Sunday morning

      Calling these individuals a "gang" makes this heist seem organized and dangerous. As an American, the proximity isn't super relevant to me since France is on the other side of the ocean. However, Oltermann is an effective storyteller, making this crime seem like a movie...crime-focused and dramatic.

    8. the head of the Louvre prepared to face difficult questions over how thieves were able to steal priceless jewellery in broad daylight.

      This line takes the blame and puts it on the leadership, shifting the focus on accountability. It frames this story as one that could've happened because someone didn't do their job.

    9. The financial loss from France’s most dramatic heist in decades has been put at nearly €90m

      The word choice of "dramatic" makes the story sound really exciting and huge. It's written to captivate the audience's attention right away, making this story seem like an important one. It's written to grab attention from the get-go, not just report facts.

    1. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their blood-feeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      (3) Figure 1<br /> Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.<br /> Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      (4) Figure 3<br /> In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      (5) Figure 4<br /> The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well. Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding? In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data, and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligand-promiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      (10) Methods<br /> In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible? For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that  after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism  after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only  after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways.

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neuro-transcriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model.

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria.

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF.

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.

      4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in the graphs inserted below.

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following:

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligand promiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns.

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (3) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (4) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impacwul. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.

      b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.

      c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformating here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides).

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-host seeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios.

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability.

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 2023). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline  after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa_mediated control of blood feeding in _An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B</sup>” is now “D1<sup>PBM</sup>” (post-bloodmeal) and “D1<sup>O</sup>” is now “D1<sup>PO</sup>” (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected.

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. there is often only one head of the household for family groups living together.

      One role of extended family is the ** head of the household. ** Can be determined by age (who is the oldest/most senior), who contributes most significant finances ("breadwinner"), or who's home it is/was initially.

    1. Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This unique study reports original and extensive behavioral data collected by the authors on 21 living mammal taxa in zoo conditions (primates, tree shrew, rodents, carnivorans, and marsupials) on how descent along a vertical substrate can be done effectively and securely using gait variables. Ten morphological variables reflecting head size and limb proportions are examined in relationship to vertical descent strategies and then applied to reconstruct modes of vertical descent in fossil mammals.

      Strengths:

      This is a broad and data-rich comparative study, which requires a good understanding of the mammal groups being compared and how they are interrelated, the kinematic variables that underlie the locomotion used by the animals during vertical descent, and the morphological variables that are associated with vertical descent styles. Thankfully, the study presents data in a cogent way with clear hypotheses at the beginning, followed by results and a discussion that addresses each of those hypotheses using the relevant behavioral and morphological variables, always keeping in mind the relationships of the mammal groups under investigation. As pointed out in the study, there is a clear phylogenetic signal associated with vertical descent style. Strepsirrhine primates much prefer descending tail first, platyrrhine primates descend sideways when given a choice, whereas all other mammals (with the exception of the raccoon) descend head first. Not surprisingly, all mammals descending a vertical substrate do so in a more deliberate way, by reducing speed, and by keeping the limbs in contact for a longer period (i.e., higher duty factors).

      Weaknesses:

      The different gait patterns used by mammals during vertical descent are a bit more difficult to interpret. It is somewhat paradoxical that asymmetrical gaits such as bounds, half bounds, and gallops are more common during descent since they are associated with higher speeds and lower duty factors. Also, the arguments about the limb support polygons provided by DSDC vs. LSDC gaits apply for horizontal substrates, but perhaps not as much for vertical substrates.

      We analyzed gait patterns using methods commonly found in the literature and discussed our results accordingly. However, the study of limbs support polygons was indeed developed specifically for studying locomotion on horizontal supports, and may not be applicable for studying vertical locomotion, which is in fact a type of locomotion shared by all arboreal species. In the future, it would be interesting to consider new methods for analyzing vertical gaits.

      The importance of body mass cannot be overemphasized as it affects all aspects of an animal's biology. In this case, larger mammals with larger heads avoid descending head-first. Variation in trunk/tail and limb proportions also covaries with different vertical descent strategies. For example, a lower intermembral index is associated with tail-first descent. That said, the authors are quick to acknowledge that the five lemur species of their sample are driving this correlation. There is a wide range of intermembral indices among primates, and this simple measure of forelimb over hindlimb has vital functional implications for locomotion: primates with relatively long hindlimbs tend to emphasize leaping, primates with more even limb proportions are typically pronograde quadrupeds, and primates with relatively long forelimbs tend to emphasize suspensory locomotion and brachiation. Equally important is the fact that the intermembral index has been shown to increase with body mass in many primate families as a way to keep functional equivalence for (ascending) climbing behavior (see Jungers, 1985). Therefore, the manner in which a primate descends a vertical substrate may just be a by-product of limb proportions that evolved for different locomotor purposes. Clearly, more vertical descent data within a wider array of primate intermembral indices would clarify these relationships. Similarly, vertical descent data for other primate groups with longer tails, such as arboreal cercopithecoids, and particularly atelines with very long and prehensile tails, should provide more insights into the relationship between longer tail length and tail-first descent observed in the five lemurs. The relatively longer hallux of lemurs correlates with tail-first descent, whereas the more evenly grasping autopods of platyrrhines allow for all four limbs to be used for sideways descent. In that context, the pygmy loris offers a striking contrast. Here is a small primate equipped with four pincer-like, highly grasping autopods and a tail reduced to a short stub. Interestingly, this primate is unique within the sample in showing the strongest preference for head-first descent, just like other non-primate mammals. Again, a wider sample of primates should go a long way in clarifying the morphological and behavioral relationships reported in this study.

      We agree with this statement. In the future, we plan to study other species, particularly large-bodied ones with varied intermembral indexes.

      Reconstruction of the ancient lifestyles, including preferred locomotor behaviors, is a formidable task that requires careful documentation of strong form-function relationships from extant species that can be used as analogs to infer behavior in extinct species. The fossil record offers challenges of its own, as complete and undistorted skulls and postcranial skeletons are rare occurrences. When more complete remains are available, the entire evidence should be considered to reconstruct the adaptive profile of a fossil species rather than a single ("magic") trait.

      We completely agree with this, and we would like to emphasize that our intention here was simply to conduct a modest inference test, the purpose of which is to provide food for thought for future studies, and whose results should be considered in light of a comprehensive evolutionary model.

      Reviewer #2 (Public review):

      Summary:

      This paper contains kinematic analyses of a large comparative sample of small to medium-sized arboreal mammals (n = 21 species) traveling on near-vertical arboreal supports of varying diameter. This data is paired with morphological measures from the extant sample to reconstruct potential behaviors in a selection of fossil euarchontaglires. This research is valuable to anyone working in mammal locomotion and primate evolution.

      Strengths:

      The experimental data collection methods align with best research practices in this field and are presented with enough detail to allow for reproducibility of the study as well as comparison with similar datasets. The four predictions in the introduction are well aligned with the design of the study to allow for hypothesis testing. Behaviors are well described and documented, and Figure 1 does an excellent job in conveying the variety of locomotor behaviors observed in this sample. I think the authors took an interesting and unique angle by considering the influence of encephalization quotient on descent and the experience of forward pitch in animals with very large heads.

      Weaknesses:

      The authors acknowledge the challenges that are inherent with working with captive animals in enclosures and how that might influence observed behaviors compared to these species' wild counterparts. The number of individuals per species in this sample is low; however, this is consistent with the majority of experimental papers in this area of research because of the difficulties in attaining larger sample sizes.

      Yes, that is indeed the main cost/benefit trade-off with this type of study. Working with captive animals allows for large comparative studies, but there is a risk of variations in locomotor behavior among individuals in the natural environment, as well as few individuals per species in the dataset. That is why we plan and encourage colleagues to conduct studies in the natural environment to compare with these results. However, this type of study is very time-consuming and requires focusing on a single species at a time, which limits the comparative aspect.

      Figure 2 is difficult to interpret because of the large amount of information it is trying to convey.

      We agree that this figure is dense. One possible solution would be to combine species by phylogenetic groups to reduce the amount of information, as we did with Fig. 3 on the dataset relating to gaits. However, we believe that this would be unfortunate in the case of speed and duty factor because we would have to provide the complete figure in SI anyway, as the species-level information is valuable. We therefore prefer to keep this comprehensive figure here and we will enlarge the data points to improve their visibility, and provide the figure with a sufficiently high resolution to allow zooming in on the details.

    1. The old lady settled herself comfortably, removing her white cotton gloves and putting them up with her purse on the shelf in front of the back window. The children’s mother still had on slacks and still had her head tied up in a green kerchief, but the grandmother had on a navy blue straw sailor hat with a bunch of white violets on the brim and a navy blue dress with a small white dot in the print. Her collars and cuffs were white organdy trimmed with lace and at her neckline she had pinned a purple spray of cloth violets containing a sachet. In case of an accident, anyone seeing her dead on the highway would know at once that she was a lady.

      This shows how much she cares about appearances, she worries more about being seen as a “lady” than about others’ feelings.

    2. THE GRANDMOTHER DIDN’T WANT to go to Florida. She wanted to visit some of her connections in east Tennessee and she was seizing at every chance to change Bailey’s mind. Bailey was the son she lived with, her only boy. He was sitting on the edge of his chair at the table, bent over the orange sports section of the Journal. “Now look here, Bailey,” she said, “see here, read this,” and she stood with one hand on her thin hip and the other rattling the newspaper at his bald head. “Here this fellow that calls himself The Misfit is aloose from the Federal Pen and headed toward Florida and you read here what it says he did to these people. Just you read it. I wouldn’t take my children in any direction with a criminal like that aloose in it. I couldn’t answer to my conscience if I did.”

      The grandmother uses fear about The Misfit to control the trip, she puts her worries and authority before others’ wishes.

    1. nightingale

      Eliot’s “The Game of Chess" and its referenced sources characterize women (or the queen piece) as the real pawns of society, exploited by men (the king piece) despite their power. Eliot begins the section with “The Chair she sat in, like a burnished throne, / Glowed on the marble…” In older versions of chess, specifically the marble-like Lewis chessmen, the queen piece sits on an elaborate throne, cradling her head in her hand with a tired expression. So, Eliot's description aligns closely with the chess piece of the Queen. At the same time, this description is a direct reference to Antony and Cleopatra: “The Chair she sat in, like a burnished throne, Glowed on the marble.” So, The Game of Chess begins with Cleopatra, the queen of Egypt and one of the most well known women of immense power labeled a seductress. In fact, the six assigned sources all display women used as scapegoats, always described but never given a chance to never given a chance to stand up for themselves. They are used as pawns in literature, society, and history. Further, these women are almost all associated deeply with snakes, or a symbol for the devil in many works. The foundation of this comparison is shown in Paradise Lost, as Eve is tempted by a serpent, or the devil, and then is blamed alongside the serpent for eternity. Notably, Cleopatra kills herself with an asp, or a serpent, to escape a future of humiliation at the expense of being forever silenced. In Ovid, Philomela’s tongue is cut out because the king dislikes her words, and severed tongue is compared to a snake. By taking away her tongue, or her voice, the king seems to believe he has stunted her ability to tempt and manipulate. In Baudelaire, he writes “The haunches slightly sharp, and the waist sinuous / As a snake poised to strike, / That she's still quite young!” Even as the woman is described in an undone state, she is still viewed as “a snake poised to strike.” Tying these references back to the text, Eliot argues through his characterization of “ the nightingale Filled all the desert with inviolable voice And still she cried, and still the world pursues,” that these women are labeled snakes, always poised to strike and poison others with their cunning manipulation, while they are truly nightingales, only afforded a grieving voice in the night. The thread is clear of women being exploited by men then blamed by those same men and the rest of society without a chance to share their voice.

    2. And crawled head downward down a blackened wall And upside down in air were towers Tolling reminiscent bells, that kept the hours And voices singing out of empty cisterns and exhausted wells

      Last year, Addie annotated this exact section and described how Eliot purposefully confuses the reader's sense of right-side-up and upside-down. In an especially insightful section of analysis she claims that if the reader were to orient herself with respect to Dracula (whom "crawled head downward down a blackened wall"), the tower down which he crawls becomes inverted - and the corresponding Tarot Card, the Dark Tower, is similarly flipped. Nested in this idea is a broader understanding: that in the chaos and turbulency of the modern world, the only form of agency we truly have is our perspective. When Dracula is flipped upside down, the world appears to him inverted; and though in fact it remains exactly the same as it always was, in his mind's eye all has been reoriented. That's precisely Eliot's point. Though the world itself may be a wasteland, there exists a copy of this world - a world of shadows, of impressions, of perspectives and opinions - which is completely up to interpretation. I think he invokes Tarot as a way of imbuing this doppelganger realm with purpose and value: Tarot is all about perspective. Your interpretation of the card, and what it tells you about your life in this theoretical duplicate of reality, informs the way you act in the real physical world - and so perhaps our agency, though constrained to our own perspectives, is more powerful than we think. The following two lines are relevant insofar as they condense several central thematic discussions: the voices, time, familiarity and remembrance, and water. All of these strands weave together a picture of reality IN FACT: that is, a world in which people are consigned to make the same mistakes over and over, a world where several voices overlap but never really hear one another, a world analogous to a dry rock. I think Eliot piles up all these images to drive home the fact that though our perspectives may change (though the Dark Tower may become inverted, or vice versa), objective reality is constant. In this way he DOES put a pessimistic constraint on the extent to which our conception of life can actually influence the events occuring around us; but nevertheless I do think there are some shards of positivity embedded in there.

    3. Phlebas the Phoenician, a fortnight dead,

      The 1920 poem "Mr. Apollinax" centers itself on a group of scholars and students attending a dinner party. The character of Mr. Apollinax himself is heavily inspired by Bertrand Russell, a mentor and friend to Eliot, who was also a famous logician, though his concepts at times were difficult for even well-versed philosophers and scholars to comprehend. In the poem, the descriptions of Mr. Apollinax and his behavior at the dinner party obtains direct parallels to "Death by Water" and the motif of the ocean itself.

      1. "His laughter was submarine and profound / Like the old man of the sea’s" (lines 8-9). In these lines, there are two clear connections to water or the ocean: the description of Mr. Apollinax's laugh as being "submarine", and the equating of such laugh to "the old man of the sea's". Beyond his supreme intellect, Mr. Apollinax is portrayed as a man deriving a great amount of pleasure and fulfillment from the world, as conveyed by his "submarine and profound laugh", indicating that unlike the hollow and despairing voices of TWL, Mr. Apollinax acquires a humor rooted in a genuine "inner richness" and vitality.
      2. "Where worried bodies of drowned men drift down in the green silence, / Dropping from fingers of surf" (lines 11-12). These lines stood out to me as referencing the general human condition of the time: people are swept away by the chaotic tide of worldly forces (industrialization, war, pursuit of goods or wealth) that have risen in the rapidly changing times of the early 20th century. The world has become so complicated that it has become nearly impossible for the regular individual to properly navigate it, leading them to "drown". This metaphorical drowning could indicate several things: on a broader scale, the fall of humanity into sin and wrongdoing; on a similar vein, humanity becoming separate from the ideals of religion or spirituality; or the succumbing of mankind to the pursuit of worldly things such as wealth, fame, or material goods. In my mind, a good argument could be made for each of these possibilities.
      3. "I looked for the head of Mr. Apollinax rolling under a chair / Or grinning over a screen / With seaweed in its hair" (lines 13-15). Now, this is an interesting detail. It is another example of decapitation (the first major instance being the headless corpse of a sex worker in Le Fleurs du Mal). The head, which literally represents reason and intellect, being separated from the body suggests a collapse of rational order and a breakdown of the mind itself. In the context of TWL, such a disjunction reflects the intellect’s estrangement from emotional or spiritual grounding. The description of the head “grinning over a screen” adds an absurd and grotesque dimension to the scene, transforming what might otherwise be horrific into a moment of unsettling fascination. Ultimately, the tension between humor and horror illuminates a key theme that runs through TWL: intellect, when isolated from the fuller spectrum of human experience (emotion, spiritual faith, and vitality) risks devolving into alienation or madness.
    4. Death by Water

      This is not our first encounter with “Death by Water.” “The Burial of the Dead” begins with “April is the cruelest month, breeding / Lilacs out of the dead land.” Spring’s rain breeds life out of decayed crops, but also out of the struggles of winter and war. Both in the poem and in the greater scope of culture, water is seen as necessary for spiritual renewal and cleansing, physical sustenance, and the regrowth of nature. The Tempest is mentioned throughout the poem and even its title reveals Eliot’s narrative journey. A tempest is a violent storm or an intense turmoil, its root “tempus” meaning time or season. The idea of a tempest itself is a violent and unforgiving turbulence which eventually ends in peace, but not without ravaging disaster. In a tempest, the water known for renewal, rebirth, and the essence of life is a force of violence. In the play, The Tempest, Ariel consoles another character about the believed loss of his father to drowning, saying, “Full fathom five thy father lies;/Of his bones are coral made;/Those are pearls that were his eyes:/Nothing of him that doth fade/But doth suffer a sea-change/Into something rich and strange.” By emphasizing that the father will remain he has just changed to become one with the sea, Shakespeare frames death by water as a spiritual shift instead of an end. Then, back to Eliot, Madame Sosostris twists this line from The Tempest when she reads the card titled “the drowned Phoenician Sailor,” says “Fear death by water,” and reminds the narrator of the line “Those are pearls that were his eyes.” Here, she does not see death as a spiritual transformation, but a loss of humanity which should be feared, emphasizing the pearl eyes as a sign that the sailor’s soul has been lost. Madame Sosotris, as sourced from Huxley, lives under several disguises, though, as a man pretending to be a woman and a poser pretending to be a prophet. Thus, Eliot frames Madame Sosostris as a false representation of the cycle of life, so that he can correct her skewed perception which is widely held by society. Here, death returns to the title but it has changed from “Burial of the Dead” to "Death by Water.” In the first section of the poem, “the dead” were given their own identity, but by this section it has become “death,” a word less connected to the people and more to their state. In the Corinthians, we see the more traditional image of water as spirituality. However, all the other referenced sources show water as death, less as a continuation of the natural cycle and more as a violent and inevitable force. <br /> In these sources, there is a recurring theme of ships being struck head on right before reaching their destination. In The Life and Death of Jason, the characters are spared and turn back, but they do not reach their destination. For Ulysses, he survives his journey and returns home safely to his family, after losing his shipmates to the sea and other challenges. In Dante, right as the characters can see land ahead, a “whirlwind struck the ship head on” and “the sea closed over us.” Eliot’s shift from water as a symbol of rebirth/life to a symbol of death is a continuation of the off-beat nature of the poem, and the awareness versus the denial of one’s fate.

    1. Note that here we do not have information about different heads. Heads related information will be examined separately when we visualize the attribution scores of the attention matrices with respect to the start or end position predictions.

      下面的图中没有关于不同的Head的信

    1. Mystery books, for me, divide themselves into two kinds.

      French states that there are two types of mysteries in this article. Ones that solve the crime head-on and bring order throughout the mystery. (She put Christie and Holmes as examples). The other is that they don't give clear answers and show that truth and evil aren't always that simple. This shows how mysteries can make us feel safe or make us think deeply about life.

    1. Tessie Hutchinson was in the center of a cleared space by now, and she held her hands out desperately as the villagers moved in on her. “It isn’t fair,” she said. A stone hit her on the side of the head.

      Tessie was chosen to get stoned by the lottery

    1. Our beloved Children and head men of the Cherokee Nation, we address youwarriors in council. We have raised all of you on the land which we now have,which God gave us to inhabit and raise provisions

      I like how they establish their ethos here. They sort of speak for all mothers here, calling on the authority of those who raised the warriors that are now in charge. Its clear that there is still a remnant of the former respect for women.

    2. he land was given to us by the Great Spirit above as our commonright, to raise our children upon, & to make support for our rising generations. Wetherefore humbly petition our beloved children, the head men & warriors, to holdout to the last in support of our common rights, as the Cherokee nation have beenthe first settlers of this land; we therefore claim the right of the soi

      This is a very cool debate to see. The mass scale, strictly defined individualism and liberalism vs. a smaller, more intimate and natural form of economy and governance based on solidarity and common belief.

    Annotators

    1. Remember that head words are important because their features play a role in how the entire phrase functions within the sentence. That’s why we name the phrase after the category of its head word. One way to think of this is that the properties of the word carry over to the phrase. Looking at how this works in a tree diagram, we can think of the properties of the head word as percolating up from the individual word to the phrase. The following diagram represents this “percolation” by showing the edges between the head words and their parent nodes as arrows.[1]

      the head word will determine the rest of the sentence structure

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors identified and described the transcriptional trajectories leading to CMs during early mouse development, and characterized the epigenetic landscapes that underlie early mesodermal lineage specification.

      The authors identified two transcriptomic trajectories from a mesodermal population to cardiomyocytes, the MJH and PSH trajectories. These trajectories are relevant to the current model for the First Heart Field (FHF) and the Second Heart Field (SHF) differentiation. Then, the authors characterized both gene expression and enhancer activity of the MJH and PSH trajectories, using a multiomics analysis. They highlighted the role of Gata4, Hand1, Foxf1, and Tead4 in the specification of the MJH trajectory. Finally, they performed a focused analysis of the role of Hand1 and Foxf1 in the MJH trajectory, showing their mutual regulation and their requirement for cardiac lineage specification.

      Strengths:

      The authors performed an extensive transcriptional and epigenetic analysis of early cardiac lineage specification and differentiation which will be of interest to investigators in the field of cardiac development and congenital heart disease. The authors considered the impact of the loss of Hand1 and Foxf1 in-vitro and Hand1 in-vivo.

      Weaknesses:

      The authors used previously published scRNA-seq data to generate two described transcriptomic trajectories.

      We agree that a two-route cardiac development model has been described, which is consistent with our analyses. However, the developmental origins and key events by early lineage specification is unclear. Our study provided new insights from the following aspects:

      a) Computational analyses inferred the earliest cardiac fate segregation by E6.75-7.0.

      b) Provided the new-generated E7.0 multi-omics data which revealed the transcriptomic and chromatin accessibility landscape.

      c) Utilized multi-omics and ChIP-seq data to construct a core regulatory network underlying the JCF lineage specification.

      d) Applied in vitro and in vivo analyses, which elucidated the synergistic and different roles of key transcription factors, HAND1 and FOXF1.

      Q1R1: Details of the re-analysis step should be added, including a careful characterization of the different clusters and maker genes, more details on the WOT analysis, and details on the time stamp distribution along the different pseudotimes. These details would be important to allow readers to gain confidence that the two major trajectories identified are realistic interpretations of the input data.

      R1R1: Thank you for the valuable suggestion. In the last version, we characterized the two major trajectories by identifying their common or specific gene sets, and by profiling the expression dynamics along pseudotime (Figure 1F). But we realized a careful description was not provided. In the revised manuscript, we have made the following improvements:

      a) Provided marker gene analyses based on cell types as well as developmental lineages to support the E7.0 progenitor clusters (Figure S1F).

      b) For Figure 1F: revised the text and introduced characteristic genes for the two trajectories.

      c) For WOT analysis: provided more details in the first paragraph of the ‘Results’ section.

      R2R1: The authors have also renamed the cardiac trajectories/lineages, departing from the convention applied in hundreds of papers, making the interpretation of their results challenging.

      R2R1: Agreed. We have changed the MJH as JCF lineage and PSH as SHF lineage.

      Q3R1: The concept of "reverse reasoning" applied to the Waddington-OT package for directional mass transfer is not adequately explained. While the authors correctly acknowledged Waddington-OT's ability to model cell transitions from ancestors to descendants (using optimal transport theory), the justification for using a "reverse reasoning" approach is missing. Clarifying the rationale behind this strategy would be beneficial.

      R3R1: Thank you for pointing out the unclear explanation. As mentioned in R1R1, we have clarified the rationale in the revised manuscript. 

      We would like to provide some additional details: WOT is designed for time-series scRNA-seq data where the time/stage each single cell is given. At any adjacent time points t<sub>i</sub> and t<sub>i+1</sub>, WOT estimates the transition probability of all cells at t<sub>i</sub> to all cells at t<sub>i+1</sub>. One can select a cell set of interest at any time point t<sub>i</sub> to infer their ancestors at t<sub>i-1</sub> or their descendants at t<sub>i+1</sub> by sums of the transition probabilities. As introduced in the original paper, WOT allows for both ‘forward’ and ‘reverse’ inference (DOI: 10.1016/j.cell.2019.01.006).

      Q3R1: As the authors used the EEM cell cluster as a starting point to build the MJH trajectory, it's unclear whether this trajectory truly represents the cardiac differentiation trajectory of the FHF progenitors:

      - This strategy infers that the FHF progenitors are mixed in the same cluster as the extra-embryonic mesoderm, but no specific characterization of potential different cell populations included in this cluster was performed to confirm this.

      To build the MJH trajectory, we performed a two-step analysis:

      (1) Firstly, we used E8.5 CM cells as a starting point to perform WOT computational reverse lineage tracing and identify CM progenitors at each time point.

      (2) Secondly, we selected EEM cells from the E7.5 CM progenitor pool, as a starting point to perform WOT analysis. Cells along this trajectory consist of the JCF lineage (Figure 1B).

      The reason why we chose to use this subset of E7.5 EEM cells was due to its purity. It is distinct from the SHF lineage as suggested by their separation in the UMAP. It is also different from FHF cells as no FHF/CM markers were detected by E7.5. 

      It is admitted that it is infeasible to achieve 100% purity in this single cell omics analysis, but we believe the current strategy of defining the JCF lineage is reasonable. The distinct gene expression dynamics (Figure 1F) and spatial mapping results (Figure 1C), between JCF and SHF lineages, also supported our conclusion.

      - The authors identified the EEM cluster as a Juxta-cardiac field, without showing the expression of the principal marker Mab21l2 per cluster and/or on UMAPs.

      Thank you for your suggestion. We have added Mab21l2 expression plots in the ICA layout (new Figure S1D), showing its transient expression dynamics, consistent with Tyser et al (DOI: 10.1126/science.abb2986).

      - As the FHF progenitors arise earlier than the Juxta-cardiac field cells, it must be possible to identify an early FHF progenitor population (Nkx2-5+; Mab21l2-) using the time stamp. It would be more accurate to use this FHF cluster as a starting point than the EEM cluster to infer the FHF cardiac differentiation trajectory.

      We appreciate your insights. We used the early FHF progenitor population (E7.75 Nkx2-5+; Mab21l2- CM cells) as the starting point and identified its progenitor cells by E7.0 (Figure S2A). Results suggest both JCF and SHF lineages contribute to the early FHF progenitor population, consistent with live imaging-based single cell tracing by Dominguez et al (DOI: 10.1016/j.cell.2023.01.001).

      These concerns call into question the overall veracity of the trajectory analysis, and in fact, the discrepancies with prior published heart field trajectories are noted but the authors fail to validate their new interpretation. Because their trajectories are followed for the remainder of the paper, many of the interpretations and claims in the paper may be misleading. For example, these trajectories are used subsequently for annotation of the multiomic data, but any errors in the initial trajectories could result in errors in multiomic annotation, etc, etc.

      Thank you for your valuable comments. In the revised manuscript, we have added details about the trajectory analysis including the procedure of WOT lineage inference, marker gene expression and early FHF lineage tracing. We also renamed the two trajectories to avoid confusion with prior published heart field trajectories. Generally, our trajectories are consistent with the published evidence about two major lineages contributing to the linear heart tube:

      a) Clonal analysis: two trajectories exist which demonstrate differential contribution to the E8.5 cardiac tube (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9).

      b) Live imaging: JCF cells contribute to the forming heart (Tyser et al, DOI: 10.1126/science.abb2986; Dominguez et al, DOI: 10.1016/j.cell.2023.01.001).

      c) Genetic labelling based lineage tracing: early Hand1+ mesodermal cells differentiate and contribute to the cardiac crescent (Zhang et al, DOI: 10.1161/CIRCRESAHA.121.318943).

      Molecular events by the initial segregation of the two lineages were not characterized before, which are the main focus of our paper. Our analyses suggest that the JCF lineage segregates earlier from the nascent/mixed mesoderm status, also consistent with the clonal analysis (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9).

      Q4R1: As mentioned in the discussion, the authors identified the MJH and PSH trajectories as nonoverlapping. But, the authors did not discuss major previously published data showing that both FHF and SHF arise from a common transcriptomic progenitor state in the primitive streak (DOI: 10.1126/science.aao4174; DOI: 10.1007/s11886-022-01681-w). The authors should consider and discuss the specifics of why they obtained two completely separate trajectories from the beginning, how these observations conflict with prior published work, and what efforts they have made at validation.

      R4R1: Thank you for the important question. For trajectory analysis, we assigned cells to the trajectory with higher fate probability, resulting in ‘non-overlapping’ cell sets. However, the statement of ‘two non-overlapping trajectories’ is inaccurate. We performed analysis of fate divergence between two trajectories (which was not shown in the first version), which suggests, before E7.0, mesodermal cells have similar probabilities to choose either trajectory (Figure S1E). We agree with you and previously published data that the JCF and SHF arise from a common progenitor pool. Correction has been made in the revised manuscript.

      Q5R1: Figures 1D and E are confusing, as it's unclear why the authors selected only cells at E7.0. Also, panels 1D 'Trajectory' and 'Pseudotime' suggest that the CM trajectory moves from the PSH cells to the MJH. This result is confusing, and the authors should explain this observation.

      R5R1: Thank you for pointing out the confusion. As mentioned in R4R1, trajectory analysis indicates JCFSHF fate segregation by E7.0 and we used Figures 1D and E to characterize the cellular status. By E7.0, JCF progenitors are at EEM or MM status, while SHF progenitors are still at the earlier differentiation stage (NM). This result is consistent with previous clonal analysis (Meilhac et al, DOI: 10.1016/s1534-5807(04)00133-9) which demonstrates an apparent earlier segregation of the first lineage. Our interpretation of the pseudotime analysis is that it represents different levels of differentiation, instead of developmental direction.

      Q6R1: Regarding the PSH trajectory, it's unclear how the authors can obtain a full cardiac differentiation trajectory from the SHF progenitors as the SHF-derived cardiomyocytes are just starting to invade the heart tube at E8.5 (DOI: 10.7554/eLife.30668).

      R6R1.1: We agree with your opinion. Our trajectory analysis covers E8.5 SHF-derived CM cells and progenitors. Cells that differentiate as CM cells after E8.5 were missed.

      The above notes some of the discrepancies between the author's trajectory analysis and the historical cardiac development literature. Overall, the discrepancies between the author's trajectory analysis and the historical cardiac development literature are glossed over and not adequately validated.

      R6R1.2: Historical cardiac development related literature provided evidence, using multiple techniques, which support the existence of two cardiac lineages with common progenitors at the beginning and overlapping contribution of the four-chamber heart. Our trajectory analysis is in agreement with this model and provides more detailed molecular insights about lineage segregation by E7.0. Thank you for pointing out our mistakes describing the observations. We have corrected the text and provided additional data (Figure S1D-F and S2), aiming to resolved the confusions.

      Q7R1: The authors mention analyzing "activated/inhibited genes" from Peng et al. 2019 but didn't specify when Peng's data was collected. Is it temporally relevant to the current study? How can "later stage" pathway enrichment be interpreted in the context of early-stage gene expression?

      R7R1: The gene sets of "activated/inhibited genes" were collected from several published perturbation datasets (Gene Expression Omnibus accession numbers GSE48092, GSE41260, GSE17879, GSE69669, GSE15268 and GSE31544) using mouse ES cells or embryos. For a specific pathway, the gene set is fixed but the gene expression levels, which change over time, reflect the pathway enrichment. This explains the differential pathway enrichment between early and late stages.

      Q8R1: Motif enrichment: cluster-specific DAEs were analyzed for motifs, but the authors list specific TFs rather than TF families, which is all that motif enrichment can provide. The authors should either list TF families or state clearly that the specific TFs they list were not validated beyond motifs.

      R8R1: Thank you for your comment. For the DAE motif analysis, we firstly inferred the motif and TF families, then tested which specific TFs are expressed in the corresponding cell cluster. We have added this information in the legend of Figure 2D.

      Q9R1: The core regulatory network is purely predictive. The authors again should refrain from language implying that the TFs in the CRN have any validated role.

      R9R1: Thank you for your kind suggestion. We have revised the manuscript to avoid any misleading implications, as follows:

      “Through single-cell multi-omics analysis, a predicted core regulatory network (CRN) in JCF is identified, consisting of transcription factors (TFs) GATA4, TEAD4, HAND1 and FOXF1.”

      Q10R1: Regarding the in vivo analysis of Hand1 CKO embryos, Figures 6 and 7:

      How can the authors explain the presence of a heart tube in the E9.5 Hand1 CKO embryos (Figure 6B) if, following the authors' model, the FHF/Juxta-cardiac field trajectory is disrupted by Hand1 CKO? A more detailed analysis of the cardiac phenotype of Hand1 CKO embryos would help to assess this question.

      R10R1: Thank you for your valuable suggestion. In the revised manuscript, we have added detailed analysis of the cardiac phenotype of Hand1 CKO embryo (Figure S8C). Data suggest that by E8.5 when heart looping initiate in control group (14/17), the hearts of Hand1 CKO embryos (3/3) still demonstrate a linear tube morphology. By E9.5 when atrium and ventricle become distinct in WT embryos, heart looping of Hand1 CKO embryos is abnormal. The cardiac defects of our MESP1CRE driven Hand1 conditional KO are consistent with those of Hand1-null mutant mice (Doi: 10.1038/ng0398-266; D oi: 10.1038/ng0398-271).

      Author response image 1.

      The bright field images of E8.5-E9.5 Ctrl and Hand1 CKO mouse embryos. The arrows indicating the embryonic heart (h) and head folds (hf). Scale bars (E8.5): 200 μm; scale bars (E9.5): 500 μm.

      Q11R1: The cell proportion differences observed between Ctrl and Hand1 CKO in Figure 6D need to be replicated and an appropriate statistical analysis must be performed to definitely conclude the impact of Hand1 CKO on cell proportions.

      R11R1: We appreciate your valuable suggestion. As Figure 6D is based on scRNA-seq experiment, where replicates were merged as one single sequencing library, statistical analysis is infeasible. To address potential concerns about cell proportions, we added IF staining experiments of EEM marker gene, Vim, in serial embryo sections (Figure S8D). Statistical analysis indicates a significant decrease of VIM+ EEM cell proportion of Hand1 CKO embryos.

      Q12R1: The in-vitro cell differentiations are unlikely to recapitulate the complexity of the heart fields invivo, but they are analyzed and interpreted as if they do.

      R12R1: We agree with your opinion. In the revised manuscript, we tuned down the interpretation of the invitro cell differentiation data. 

      Previous version:

      I.  “The analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting the PSH specific genes.”

      II. “Together, our data indicated that mutual regulation between HAND1 and FOXF1 could play a key role in MJH cardiac progenitor specification.”

      III. “Thus, our data further supported the specific and synergistic roles of HAND1 and FOXF1 in MJH cardiac progenitor specification.”

      Revised version:

      I.  “The analysis indicated that HAND1 and FOXF1 were able to directly activate the JCF specific genes.”

      II. “Together, our in vitro experimental data indicated that mutual regulation between HAND1 and FOXF1 could play a key role in activation of JCF specific genes.”

      III. “These results suggest that HAND1 and FOXF1 may cooperatively regulate early cardiac lineage specification by promoting JCF-associated gene expression and suppressing alternative mesodermal programs.”

      Q13R1: The schematic summary of Figure 7F is confusing and should be adjusted based on the following considerations:

      (a) the 'Wild-type' side presents 3 main trajectories (SHF, Early HT and JCF), but uses a 2-color code and the authors described only two trajectories everywhere else in the article (aka MJH and PSH). It's unclear how the SHF trajectory (blue line) can contribute to the Early HT, when the Early HT is supposed to be FHF-associated only (DOI: 10.7554/eLife.30668). As mentioned previously in Major comment 3., this model suggests a distinction between FHF and JCF trajectories, which is not investigated in the article.

      R13R1(a): Thank you for your great insights. The paper you mentioned used Nkx2.5_cre/+; Rosa26tdtomato+/- and _Nkx2.5_eGFP embryos to reconstruct the cardiac morphologies between E7.5 and E8.2. Their 3D models clearly demonstrate the transition from yolk sac to FHF and then SHF (Figure 2A’ and A’’). The location of yolk sac is defined as JCF in later literature (DOI: 10.1126/science.abb2986). However, as _Nkx2.5 mainly marks cells after the entry of the heart tube, it is unable to reflect the lineage contribution by JCF or SHF. As in R3R1, more and more evidence support the contribution of both lineages to the Early HT, which is discussed in a recent review paper (DOI: 0.1016/j.devcel.2023.01.010).

      (b) the color code suggests that the MJH (FHF-related) trajectory will give rise to the right ventricle and outflow tract (green line), which is contrary to current knowledge.

      R13R1(b): Thank you for pointing out the confusion. The coloring of outflow tract is not an indication of JCF lineage contribution. We have changed the color of JCF/SHF trajectory in the revised model.

      Minor comments:

      Q14R1: How genes were selected to generate Figure 1F? Is this a list of top differentially expressed genes over each pseudotime and/or between pseudotimes?

      R14R1: For each trajectory, we ranked genes by the correlation between expression levels and pseudotime.

      Top 1000 genes for each group were selected.

      Q15R1: Regarding Figure 1G, it's unclear how inhibited signaling can have an increased expression of underlying genes over pseudotimes. Can the authors give more details about this analysis and results?

      R15R1: The increased expression of ‘inhibited genes’ could be explained as an indication of decreasing signaling levels or compensation effect by other signaling pathways. We appreciate your kind suggestion. Details about this analysis have been added in the Method section.

      Q16R1: How do the authors explain the visible Hand1 expression in Hand1 CKO in Figure S7C 'EEM markers'? Is this an expected expression in terms of RNA which is not converted into proteins?

      R16R1: Our opinion is that the visible Hand1 expression caused by the imperfect knock-out efficiency by Mesp1-Cre driven system.

      Q17R1: The authors do not address the potential presence of doublets (merged cells) within their newly generated dataset. While they mention using "SCTransform" for normalization and artifact removal, it's unclear if doublet removal was explicitly performed.

      R17R1: We appreciate your kind reminder. Doublet removal was performed using R package ‘DoubletFinder’ (DOI: 10.1016/j.cels.2019.03.003). We have added this information in the revised manuscript.

      Reviewer #2 (Public review):

      Summary of goals:

      The aims of the study were to identify new lineage trajectories for the cardiac lineages of the heart, and to use computational and cell and animal studies to identify and validate new gene regulatory mechanisms involved in these trajectories.

      Strengths:

      The study addresses the long-standing yet still not fully answered questions of what drives the earliest specification mechanisms of the heart lineages. The introduction demonstrates a good understanding of the relevant lineage trajectories that have been previously established, and the significance of the work is well described. The study takes advantage of several recently published data sets and attempts to use these in combination to uncover any new mechanisms underlying early mesoderm/cardiac specification mechanisms. A strength of the study is the use of an in vitro model system (mESCs) to assess the functional relevance of the key players identified in the computational analysis, including innovative technology such as CRISPR-guided enhancer modulations. Lastly, the study generates mesoderm-specific Hand1 LOF embryos and assesses the differentiation trajectories in these animals, which represents a strong complementary approach to the in vitro and computational analysis earlier in the paper. The manuscript is clearly written and the methods section is detailed and comprehensive.

      Comments and Weaknesses:

      Overall: The computational analysis presented here integrates a large number of published data sets with one new data point (E7.0 single cell ATAC and RNA sequencing). This represents an elegant approach to identifying new information using available data. However, the data presentation at times becomes rather confusing, and relatively strong statements and conclusions are made based on trajectory analysis or other inferred mechanisms while jumping from one data set to another. The cell and in vivo work on Hand1 and Foxf1 is an important part of the study. Some additional experiments in both of these model systems could strongly support the novel aspects that were identified by the computational studies leading into the work.

      We appreciate your positive comments and insightful suggestions. In the revised manuscript, we have incorporated additional analyses and experimental validations to address the concerns raised. Specifically, we added RNA velocity analysis to independently support the identification of the MJH and PSH trajectories, performed immunofluorescence staining of mesodermal and cardiac markers in Hand1 and Foxf1 knockout models, and included Vim staining-based quantification in Hand1 CKO embryos to assess developmental outcomes in vivo. Furthermore, we revised potentially overinterpreted conclusions, clarified methodological details of WOT analysis. These revisions have strengthened both the rigor and clarity of the manuscript.

      Q1R2: Definition of MJH and PSH trajectory:

      The study uses previously published data sets to identify two main new differentiation trajectories: the MJH and the PSH trajectory (Figure 1). A large majority of subsequent conclusions are based on in-depth analysis of these two trajectories. For this reason, the method used to identify these trajectories (WTO, which seems a highly biased analysis with many manually chosen set points) should be supported by other commonly used methods such as for example RNA velocity analysis. This would inspire some additional confidence that the MJH and PSH trajectories were chosen as unbiased and rigorous as possible and that any follow-up analysis is biologically relevant.

      R1R2: We appreciate your valuable comments. It is totally agreed that other commonly used methods help strengthen our conclusion about the two main trajectories. To this end, we performed RNA velocity analysis for the cardiac specification. Results support the contribution to CM along the MJH and PSH routes.

      Author response image 2.

      UMAP layout is colored by cell types. Developmental directions, shown as arrows, are inferred by RNA-velocity analysis.

      Actually, several recent studies indicated a convergence cardiac developing model where progenitors reach a myocardial state along two trajectories (DOI: 10.1016/j.devcel.2023.01.010). However, when and how specification between the two routes were unclear. Our data and analysis revealed a clear fate separation by E7.0 from transcriptomic and epigenetic perspectives, where unbiased RNA velocity analysis was performed (Figure 2C).

      We would like to clarify how we performed WOT (DOI: 10.1016/j.cell.2019.01.006) analysis: the only manually chosen cell set was the starting set, which was all cardiomyocyte cells by E8.5, of computational reverse lineage tracing. The ancestor cells were predicted in an unbiased manner among all mesodermal cells.

      Q2R2.1: Identification of MJH and PSH trajectory progenitors:

      The study defines various mesoderm populations from the published data set (Figure 1A-E), including nascent mesoderm, mixed mesoderm, and extraembryonic mesoderm. It further assigns these mesoderm populations to the newly identified MJH/PSH trajectories. Based on the trajectory definition in Figure 1A it appears that both trajectories include all 3 mesoderm populations, albeit at different proportions and it seems thus challenging to assign these as unique progenitor populations for a distinct trajectory, as is done in the epigenetic study by comparing clusters 8 (MJH) and 2 (PSH)(Figure 2). 

      R2R2.1: According to our model, the most significant difference between the two trajectories is their enrichment of EEM and PM cell types (Figure 1B), which represent the middle stages of cardiac development. Both trajectories begin as Mesp1+ Nascent mesoderm cells (Figure 1F), which is supported by Mesp1 lineage tracing (DOI: 10.1161/CIRCRESAHA.121.318943), and ends as cardiomyocytes. Our epigenetic analysis focused on the E7.0 stage when the two trajectories could be clearly separated and when JCF and SHF lineages were at mixed mesoderm and nascent mesoderm states, respectively. However, SHF lineage was predicted to bypass mixed mesoderm state later on.

      Q2R2.2: Along similar lines, the epigenetic analysis of clusters 2 and 8 did not reveal any distinct differences in H3K4m1, H3K27ac, or H3K4me3 at any of the time points analyzed (Figure 2F). While conceptually very interesting, the data presented do not seem to identify any distinct temporal patterns or differences in clones 2 and 8 (Figure 2H), and thus don't support the conclusion as stated: "the combined transcriptome and chromatin accessibility analysis further supported the early lineage segregation of MJH and the epigenetic priming at gastrulation stage for early cardiac genes".

      R2R2.2: In the epigenetic analysis, we delineated the temporal dynamics of E7.0 cluster-specific DAEs by selecting earlier (E6.5) and later (E7.5) time points. DAEs of C8 and C2 represent regulatory elements for the JCF and SHF lineages, respectively. We also included C1 DAEs as a reference to demonstrate the relative activity of C8 and C2. The overall temporal pattern suggests activation of C8 & C2, as their H3K4me1 and H3K27ac levels surpass C1 over time. Between C8 and C2, the following distinctions could be observed:

      a) H3K4me1 levels of C8 are higher by E6.5 and E7.0, with low H3K27ac levels, indicating early priming of C8 DAEs.

      b) By E7.5, H3K4me1 levels of C8 are caught up by C2 in E7.5 anterior mesoderm (E7.5_AM, Figure 2F column 3), where cardiac mesoderm is located.

      c) H3K4me1 and H3K27ac levels of C8 are similar as C1 in the posterior mesoderm (E7.5_P, Figure 2F column 4) and much higher than C2.

      d) From the perspective of chromatin accessibility, hundreds of characteristic DAEs were identified for C2 and C8 (Figure 2D), exemplified by the primed and active enhancers which were predicted to interact with cluster-specific genes (Figure 2H).

      Together with the transcriptomic analyses (Figure 2C), these data are consistent with our conclusion about early lineage segregation and epigenetic priming.

      Q3R2: Function of Hand1 and Foxf1 during early cardiac differentiation:

      The study incorporated some functional studies by generating Hand1 and Foxf1 KO mESCs and differentiated them into mesoderm cells for RNA sequencing. These lines would present relevant tools to assess the role of Hand1 and Foxf1 in mesoderm formation, and a number of experiments would further support the conclusions, which are made for the most part on transcriptional analysis. For example, the study would benefit from quantification of mesoderm cells and subsequent cardiomyocytes during differentiation (via IF, or more quantitatively, via flow cytometry analysis). These data would help interpret any of the findings in the bulk RNAseq data, and help to assess the function of Hand1 and Foxf1 in generating the cardiac lineages. Conclusions such as "the analysis indicated that HAND1 and FOXF1 could dually regulate MJH specification through directly activating the MJH specific genes and inhibiting PSH specific genes" seem rather strong given the data currently provided.

      R3R2: Thank you for your kind suggestions. We added IF staining of mesodermal (Zic3), JCF (Hand1) and cardiac markers (Tnnt2), followed by cell quantification. Results indicate that Hand1 and Foxf1 knockout leads to reduced commitment to the JCF lineage, evidenced by the loss of Hand1 expression, accumulation of undifferentiated Zic3+ mesoderm, and impaired cardiomyocyte formation (Tnnt2+), consistent with the up-regulation of JCF lineage specific genes and the downregulation of SHF lineage specific genes.

      We also revised the conclusion as “These results suggest that HAND1 and FOXF1 may cooperatively regulate early cardiac lineage specification by promoting JCF-associated gene expression and suppressing alternative mesodermal programs.”.

      (4) Analysis of Hand1 cKO embryos:

      Adding a mouse model to support the computational analysis is a strong way to conclude the study. Given the availability of these early embryos, some of the findings could be strengthened by performing a similar analysis to Figure 7B&C and by including some of the specific EEM markers found to be differentially regulated to complement the structural analysis of the embryos.

      R4R2: hank you for your positive comments and help. In the revised manuscript, we performed IF staining of EEM marker Vim in a similar fashion as Figure 7B&C (Figure S8D). In comparison with control embryos, the Hand1 CKO embryos demonstrated significant less number of Vim+ cells, further strengthening the conclusion that Hand1 CKO blocked the developmental progression toward JCF direction.

      Q5R2: Current findings in the context of previous findings:

      The introduction carefully introduces the concept of lineage specification and different progenitor pools. Given the enormous amount of knowledge already available on Hand1 and Foxf1, and their role in specific lineages of the early heart, some of this information should be added, ideally to the discussion where it can be put into context of what the present findings add to the existing understanding of these transcription factors and their role in early cardiac specification.

      R5R2: We appreciate your positive comments and kind reminder. We have added discussion about how our study could be put into the body of findings on Hand1 and Foxf1. Although these two genes have been validated to be functionally important for heart development, it is unclear when and how they affect this process. Using in-vivo and in-vitro models and single cell multi-omics analyses, we provided evidence to fill the gaps from multiple aspects, including cell state temporal dynamics, regulatory network, and epigenetic regulation underlying the very early cardiac lineage specification.

      Reviewer #3 (Public review):

      Q1R3: In Figure 1A, could the authors justify using E8.5 CMs as the endpoint for the second lineage and better clarify the chamber identities of the E8.5 CMs analysed? Why are the atrial genes in Figure 1C of the PSH trajectory not present in Table S1.1, which lists pseudotime-dependent genes for the MJH/PSH trajectories from Figure 1F?

      R1R3: Thank you for your comments. We used E8.5 CMs as the endpoint of the second (SHF) lineage because this stage represents a critical point where SHF-derived cardiomyocytes have begun distinct differentiation, allowing us to capture terminal lineage states reliably. The chamber identities of E8.5 CMs were determined based on known marker genes (DOI: 10.1186/s13059-025-03633-3). The atrial genes shown in Figure 1C reflect cluster-specific markers that may not meet the strict pseudotime-dependency criteria used to generate Table S1.1, which lists genes dynamically changing along the MJH/PSH trajectories.

      Q2R3: Could the authors increase the resolution of their trajectory and genomic analyses to distinguish between the FHF (Tbx5+ HCN4+) and the JCF (Mab21l2+/ Hand1+) within the MJH lineage? Also, clarify if the early extraembryonic mesoderm contributes to the FHF.

      R2R3: Thank you for your great suggestions. To distinguish between the FHF and JCF trajectories, we used early FHF progenitor population (E7.75 Nkx2-5+; Mab21l2- CM cells) as the starting point and performed WOT lineage inference (Figure S2A). Results suggest that both JCF and SHF progenitors contribute to the FHF, consistent with live imaging-based single cell tracing by Dominguez et al (DOI: 10.1016/j.cell.2023.01.001) and lineage tracing results by Zhang et al (DOI: 10.1161/CIRCRESAHA.121.318943). We also analyzed the expression levels of FHF marker genes (Tbx5, Hcn4) and observed their activation along both trajectories (Figure S2B).

      Q3R3: The authors strongly assume that the juxta-cardiac field (JCF), defined by Mab21l2 expression at E7.5 in the extraembryonic mesoderm, contributes to CMs. Could the authors explain the evidence for this? Could the authors identify Mab21l2 expression in the left ventricle (LV) myocardium and septum transversum at E8.5 (see Saito et al., 2013, Biol Open, 2(8): 779-788)? If such a JCF contribution to CMs exists, the extent to which it influences heart development should be clarified or discussed.

      R3R3: Thank you for the important question. For the JCF contribution to the heart tube, several lines of evidence have been published in recent years using micro-dissection of mouse embryonic heart (DOI: 10.1126/science.abb2986), live imaging (DOI: 10.1016/j.cell.2023.01.001) and lineage tracing approaches (DOI: 10.1161/CIRCRESAHA.121.318943). According to Tyser et al (DOI: 10.1126/science.abb2986), Mab21l2 expression is detected in septum transversum at E8.5 and the Mab21l2+ lineage contribute to LV, basically consistent with the literature you mentioned (Saito et al., 2013, Biol Open, 2(8): 779-788). Our lineage inference analyses further support the model and suggest earlier specification by JCF. However, the focus of our work is the transcriptional and epigenetic regulation of underlying the JCF developmental trajectory.

      Q4R3: Could the authors distinguish the Hand1+ pericardium from JCF progenitors in their single-cell data and explain why they excluded other cell types, such as the endocardium/endothelium and pericardium, or even the endoderm, as endpoints of their trajectory analysis? At the NM and MM mesoderm stages, how did the authors distinguish the earliest cardiac cells from the surrounding developing mesoderm?

      R4R3: We appreciate your insightful question. In our other study (DOI: 10.1186/s13059-025-03633-3), we tried to further divide the CM cells as subclusters and it seems that their difference is mainly driven by the segmentation of the heart tube (e.g. LV, RV, OFT etc.). By the E8.5 stage, we are unable to identify the Hand1+ pericardium cluster. 

      Also, it seems infeasible to distinguish endocardium from other endothelium cells only using singlecell data. High resolution spatial transcriptome data is required. Alternatively, we analyzed the E7.0 mesodermal lineages and determined C5/6 as hematoendothelial progenitors. Marker gene analysis indicate that their lineage segregation has started by this stage (Figure S4C and Author response image 3).

      Author response image 3.

      UMAP layout, using scRNA-seq (Reference data) and snRNA-seq (Multiome data), is colored by cell types (left). Expression of hematoendothelial progenitor marker genes is shown (right).

      We did observe the difference between the earliest cardiac cells from the surrounding developing mesoderm. As in Figure 1D, cells belonging to the JCF lineage (Hand1 high/Lefty2 low) were clustered at the EEM/MM end, in contrast to the NM cells.

      Q5R3: Could the authors contrast their trajectory analysis with those of Lescroart et al. (2018), Zhang et al., Tyser et al., and Krup et al.?

      R5R3: Thank you for the valuable suggestion. We compared our model with the suggested ones and summarized as follows:

      (1) Lescroart et al: The JCF and SHF progenitor cells match their DCT2 (Bmp4+) and DCT3 (Foxc2+) clusters, respectively.

      (2) Zhang et al: The JCF lineage matches their EEM-DC (developing CM)-CM trajectory. The SHF lineage is consistent with their NM-LPM (lateral plate mesoderm)-DC (developing CM)-CM trajectory. Notably, their EEM-DC-CM also expressed FHF marker (Tbx5) at later stages.

      (3) Tyser et al: we performed data integration analysis and found the correspondence between JCF progenitors (EEM cells from the cardiac trajectory) and their Me5, as well as SHF progenitors (PM cells from the cardiac trajectory) with Me7. In their model, both Me5 and Me7 contribute to Me4 (representing the FHF), consistent with our results (see Tyser et al., 2021 and Pijuan-Sala et al., 2019).

      (4) Krup et al also performed URD lineage inference, providing a model with CM (12) and Cardiac mesoderm (29) as cardiac end points. Their model did not seem to suggest distinct trajectories between JCF and SHF lineages, as both JCF (Hand1) and SHF (Isl1) markers co-expressed in CM.

      Q6R3: Previous studies suggest that Mesp2 expression starts at E8 in the presomitic mesoderm (Saga et al., 1997). Could the authors provide in situ hybridization or HCR staining to confirm the early E7 Mesp2 expression suggested by the pseudo-time analysis of the second lineage.

      R6R3: We validated the expression of E7 Mesp2 using Geo-seq spatial transcriptome data (Author response image 4, upper). Results suggest the high spatial enrichment of Mesp2 expression in primitive streak (T+) and/or nascent mesoderm (Mesp1+) cells, which correspond to the progenitors of the second lineage.

      In situ hybridization data (PMID: 17360776) also supports the early expression of Mesp2 by E7 (Author response image 4, lower).

      Author response image 4.

      (Upper) E7 Geo-seq data for selected genes: T, Mesp1, and Mesp2. (Lower) Mesp2 expression during early development; image acquired from Morimoto et al. (PMID: 17360776).

      Q7R3: Could the authors also confirm the complementary Hand1 and Lefty2 expression patterns at E7 using HCR or in situ hybridization? Hand1 expression in the first lineage is plausible, considering lineage tracing results from Zhang et al.

      R7R3: Thank you for your great suggestion. We observed spatially complementary expression patterns of Hand1 and Lefty2 in the Geo-seq spatial transcriptomic data. In the mesoderm layer, Hand1 is highly expressed in the proximal end. While Lefty2+ cells exhibit preference toward the distal direction.

      Author response image 5.

      E7 Geo-seq data for selected genes: Hand1 and Lefty2.

      Q8R3: Could the authors explain why Hand1 and Lefty2+ cells are more likely to be multipotent progenitors, as mentioned in the text?

      R8R3: Thank you for your question. Here, we observed E7.0 Mesp1+ and Lefty2+ nascent mesodermal cells assigned to both the JCF and SHF lineages (Figure 1D), indicating their multipotency. On the other hand, we also found low expressions of JCF markers, Hand1 and Msx2, by the early stage of the SHF trajectory (Figure 1F). Thus, we concluded that both Hand1+ and Lefty2+ E7.0 mesodermal cells are likely to be multipotent.

      Q9R3: Could the authors comment on the low Mesp1 expression in the mesodermal cells (MM) of the MJH trajectory at E7 (Figure 1D)? Is Mesp1 transiently expressed early in MJH progenitors and then turned off by E7? Have all FHF/JCF/SHF cells expressed Mesp1?

      R9R3: Thank you for the insightful questions. Zhang et al. (PMID: 34162224) performed scRNA-seq analysis of Mesp1 lineage-traced cells, which indicate the contribution of Mesp1+ cells to FHF, JCF, and SHF. This is also supported by Dominguez et al. utilizing live imaging approaches (PMID: 36736300). Our temporal dynamics analysis suggests that along the JCF trajectory, Mesp1 is turned off as JCF characteristic genes were up regulated (Figure 1F and S1D).

      Q10R3: Could the authors clarify if their analysis at E7 comprises a mixture of embryonic stages or a precisely defined embryonic stage for both the trajectory and epigenetic analyses? How do the authors know that cells of the second lineage are readily present in the E7 mesoderm they analysed (clusters 0, 1, and 2 for the multiomic analysis)?

      R10R3: Thank you for your questions. Although embryos were collected at E7.0, the developmental stages could be variable. As exemplified by Karl Theiler’s book, “The House Mouse: Atlas of Embryonic Development”, mesoderm was visible for some E7.0 egg cylinders but not in others. To test whether cells of the second lineage are present in the E7.0 mesoderm, we analyzed the WOT lineage tracing results and the cell type composition by E7.0 (Author response image 6, left panel). Most cells belong to the nascent mesoderm (NM) or mixed mesoderm (MM), while almost no cells were assigned to the primitive streak (PS). To avoid the possibility that the E7.0 embryos represented later stages, we also analyzed the E6.75 cells of the second lineage (Author response image 6, middle panel). Results suggest that NM cells were still the dominant contributors to the second lineage, although ~22.6% cells were assigned to the PS. The abovementioned analyses were performed using the scRNA-seq data. The embryos of the E7.0 single-cell multi-omics represent similar developmental stages as the scRNAseq data, as suggested by the well-aligned UMAPs (Figure S1D, right panel). Thus, we conclude that for the multi-omics data, the cells of the second lineage are also readily present in the mesoderm.

      Author response image 6.

      (Left and middle) Lineage inference and cell type composition at E7.0 and E6.75. (Right) UMAPs of E7.0 multi-omics and scRNA-seq data.

      Q11R3: Could the authors further comment on the active Notch signaling observed in the first and second lineages, considering that Notch's role in the early steps of endocardial lineage commitment, but not of CMs, during gastrulation has been previously described by Lescroart et al. (2018)?

      R11R3: We appreciate your kind suggestion. As reported by Lescroart et al. (2018), using Notch1CreERT2/Rosa-tdTomato mice and tamoxifen administration at E6.5, early expression of Notch1 mostly marked endocardial cells (ECs, 76.9-83.9%), with minor contribution to the cardiomyocytes (6.0-16.6%) and to the epicardial cells (EPs, 6.0-6.5%). The lineage specificity of Notch1 is consistent with our E7.0 multi-omics data, where its expression was mainly observed in the NM and hematoendothelial progenitors (Author response image 7). Interestingly, expression of other NOTCH receptor genes (Notch2 and Notch3) and ligand genes (Dll1 and Dll3) in the CM lineages. Notch3 demonstrate higher expression in the first lineage, while Dll1 and Dll3 were highly expressed in the second lineage. The study by Lescroart et al. (2018) emphasized the role of Notch1 as an EC lineage marker, while our analyses aimed at the activity of the NOTCH pathway.

      Author response image 7.

      Expression of representative NOTCH genes at E7.0 (multi-omics data).

      Q12R3: In cluster 8, Figure 2D, it seems that levels of accessibility in cluster 8 are relatively high for genes associated with endothelium/endocardium development in addition to MJH genes. Could the authors comment and/or provide further analysis?

      R12R3: Thanks for you for raising this interesting point. To confirm the association of these genes with endothelium (EC) and/or MJH, we analyzed their expression levels by E7.0 (progenitor stage) and E8.0 (differentiated stage) (Author response image 8). Among target genes of MJH-specific DAEs (cluster 3/7/8 in Figure 2D), Pmp22, Mest, Npr1, Pkp2, and Pdgfb were expressed in the hematoendothelial progenitors. The Nrp1 gene and PDGF pathway play critical roles in endothelial development by modulating cell migration (PMID: 15920019 and 28167492), which is also important for MJH cells. In addition, we observed common ATAC-seq peaks in both hematoendothelial and MJH clusters (Author response image 9), indicating shared regulatory elements. Interestingly, Pdgfb is not expressed by CM in vivo, it is actively expressed in the CM of the in vitro system (Author response image 9). These results indicate regulatory and functional closeness between hematoendothelial and MJH cell groups, at early stages of lineage establishment.

      Author response image 8.

      Regulatory connection between MJH and endothelial cells (ECs).

      Author response image 9.

      Representative genome browser snapshots of scATAC-seq (aggregated gene expression and chromatin accessibility for each cluster) and RNA-seq at the Pdgfb locus.

      Q13R3: Can the authors clarify why they state that cluster 8 DAEs are primed before the full activation of their target genes, considering that Bmp4 and Hand1 peak activities seem to coincide with their gene expression in Figure 2G?

      R13R3: Thanks for your great question. The overall analyses indicate low to medium levels of H3K4me1 and H3K27ac by E6.5-7.0 at cluster 8 DAEs, which were fully activated by E7.5 (Figure 2F). Further inspections suggest different epigenetic status of individual DAEs (Figure 3H), which could be active (K4me1+/K27ac+), primed (K4me1+/K27ac-), or inactive (K4me1-/K27ac-). Thus, we concluded that many DAEs could be primed before full activation. The coincidence of enhancer peak activities and gene expression was observed by aggregating single cell clusters at a single stage E7.0, which does not rule out the possibility that these enhancers are epigenetically primed at earlier stages.

      Q14R3: Did the authors extend the multiomic analysis to Nanog+ epiblast cells at E7 and investigate if cardiac/mesodermal priming exists before mesodermal induction (defined by T/Mesp1 onset of expression)?

      R14R3: We appreciate your kind suggestion. We observed low levels of T/Mesp1 expression in the E7.0 Nanog+ epiblast cells (Author response image 10). Interestingly, the T+/Mesp1+ cells were not clustered toward any specific differentiation directions in the UMAP. We also analyzed DAE activities in each single cell by averaging over the C1/C2/C8 DAE sets. The C2 and C8 DAEs were clearly less active than the C1 DAEs. But C2/C8-DAE active cells were observed among the E7.0 Nanog+ epiblast cells. These data indicate the early priming exists in epiblast cells before the commitment to cardiac/mesodermal differentiation.

      Author response image 10.

      Gene expression and DAE activity levels of E7.0 Nanog+ epiblast cells shown in UMAP layout.

      Q15R3: In the absence of duplicates, it is impossible to statistically compare the proportions of mesodermal cell populations in Hand1 wild-type and knockout (KO) embryos or to assess for abnormal accumulation of PS, NM, and MM cells. Could the authors analyse the proportions of cells by careful imaging of Hand1 wild-type and KO embryos instead?

      R15R3: Thank you for your important question. To assess the proportions of mesodermal cell populations in E7.25 wild-type and Hand1-CKO embryos, we analyzed the serial coronal sections of the extraembryonic portions and performed staining of the Vim gene, which marks the extra-embryonic mesodermal (EEM) cells (Figure S8D). We then counted the numbers of mesodermal/Vim+ EEM cells and calculated the relative proportion of Vim+ EEM cells in each section. The proportion of Vim+ EEM cells was statistically lower in the Hand1-CKO embryo, consistent with our model that Hand1 deletion led to blocked MJH specification.

      Q16R3: Could the authors provide high-resolution images for Figure 7 B-C-D as they are currently hard to interpret?

      R16R3: Thank you for your suggestion. We have replaced Figure 7B-C-D with high-resolution images.

      Recommendations for the authors:  

      Reviewing Editor Comments:

      Discussions among reviewers emphasize the importance of better addressing and validating the trajectory analysis by using more common and alternative bioinformatics and spatial approaches. Further discussion on whether there is a common transcriptional progenitor between the two trajectories is also required to enhance the significance of the study. For functional analysis, further validations are needed as the current data only partially support the claims. Please see public reviews for details.

      Reviewer #2 (Recommendations For The Authors):

      Beyond the suggestions made in the public review, below are some minor aspects for consideration:

      The manuscript is well written overall but may benefit from a thorough read-through and editing of some minor grammatical errors.

      We have carefully read through the manuscript and corrected minor grammatical errors to improve clarity and readability.

      Figure 2C: RNA velocity information gets largely lost due to the color choice of EEM and MM (black) on which the direction of arrows can't be appreciated.

      We have updated the color scheme in Figure 2C.

      Figure 6D: sample information is partially cut off in the graph.

      Sample information is completely shown now.

      The last paragraph of the discussion has some formatting issues with the references.

      We have corrected the formatting issues with the references.

      The methods and results section does not comment on if, or how many embryos were pooled for the sequencing analysis performed for this study.

      We have added the numbers of embryos for sequencing analyses in the methods section.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      In the discussion, authors could reconsider the sentence: "The process of cardiac lineage segregation is a complex one that may involve TF regulatory networks and signaling pathways," as it is not informative.

      We have re-written the sentence as: “Thus, additional regulation must exist and instructs the process of JCF-SHF lineage segregation.”

    1. Below are some examples of citations and how you can find the resources they describe.

      To avoid highlighting this entire page, I shall leave my comment here. This may be the first time it's been plainly laid out to me how citations are used to trace sources. Throughout our schooling in K-12, it was drilled into my head how important citations are and how to write them. Perhaps I simply have a goldfish brain, but this chapter finally made the process of actually using citations "click" in my head.

    1. intention is to address specific challenges, such as familyhomelessness, that can interfere with consistent service access. Transitions procedures andpractices can also ensure effective transitions from Early Head Start to Head Start and to otherearly childhood education programs or schools.

      What are barriers that may impact the progress of the family and children what are the best ways to combat these barriers.

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This paper presents results from four independent experiments, each of which tests for rhythmicity in auditory perception. The authors report rhythmic fluctuations in discrimination performance at frequencies between 2 and 6 Hz. The exact frequency depends on the ear and experimental paradigm, although some frequencies seem to be more common than others.

      Strengths:

      The first sentence in the abstract describes the state of the art perfectly: "Numerous studies advocate for a rhythmic mode of perception; however, the evidence in the context of auditory perception remains inconsistent". This is precisely why the data from the present study is so valuable. This is probably the study with the highest sample size (total of > 100 in 4 experiments) in the field. The analysis is very thorough and transparent, due to the comparison of several statistical approaches and simulations of their sensitivity. Each of the experiments differs from the others in a clearly defined experimental parameter, and the authors test how this impacts auditory rhythmicity, measured in pitch discrimination performance (accuracy, sensitivity, bias) of a target presented at various delays after noise onset.

      Weaknesses:

      (1) The authors find that the frequency of auditory perception changes between experiments. I think they could exploit differences between experiments better to interpret and understand the obtained results. These differences are very well described in the Introduction, but don't seem to be used for the interpretation of results. For instance, what does it mean if perceptual frequency changes from between- to within-trial pitch discrimination? Why did the authors choose this experimental manipulation? Based on differences between experiments, is there any systematic pattern in the results that allows conclusions about the roles of different frequencies? I think the Discussion would benefit from an extension to cover this aspect.

      We believe that interpreting these differences remains difficult and a precise, detailed (and possibly mechanistic) interpretation is beyond the goal of the present study. The main goal of this study was to explore the consistency and variability of effects across variations of the experimental design and samples of participants. Interpreting specific effects, e.g. at particular frequencies, would make sense mostly if differences between experiments have been confirmed in a separate reproduction. Still, we do provide specific arguments for why differences in the outcome between different experiments, e.g. with and without explicit trial initialization by the participants, could be expected. See lines 91ff in the introduction and 786ff in the discussion.

      (2) The Results give the impression of clear-cut differences in relevant frequencies between experiments (e.g., 2 Hz in Experiment 1, 6 Hz in Exp 2, etc), but they might not be so different. For instance, a 6 Hz effect is also visible in Experiment 1, but it just does not reach conventional significance. The average across the three experiments is therefore very useful, and also seems to suggest that differences between experiments are not very pronounced (otherwise the average would not produce clear peaks in the spectrum). I suggest making this point clearer in the text.

      We have revised the conclusions to note that the present data do not support clear cut differences between experiments. For this reason we also refrain from detailed interpretations of specific effects, as suggested by this reviewer in point 1 above.

      (3) I struggle to understand the hypothesis that rhythmic sampling differs between ears. In most everyday scenarios, the same sounds arrive at both ears, and the time difference between the two is too small to play a role for the frequencies tested. If both ears operate at different frequencies, the effects of the rhythm on overall perception would then often cancel out. But if this is the case, why would the two ears have different rhythms to begin with? This could be described in more detail.

      This hypothesis was not invented by us, but in essence put forward in previous work. The study by Ho et al. CurrBiol 2017 has reported rhythmic effects at different frequencies in the left and right ears, and we here tried to reproduce these effects. One could speculate about an ear-difference based on studies reporting a right-ear advantage in specific listening tasks, and the idea that different time scales of rhythmic brain activity may be specifically prevail in the left and right cortical hemispheres; hence it does not seem improbable that there could be rhythmic effects in both ears at different frequencies. We note this in the introduction, l. 65ff.

      Reviewer #2 (Public review):

      Summary:

      The current study aims to shed light on why previous work on perceptual rhythmicity has led to inconsistent results. They propose that the differences may stem from conceptual and methodological issues. In a series of experiments, the current study reports perceptual rhythmicity in different frequency bands that differ between different ear stimulations and behavioral measures.

      The study suggests challenges regarding the idea of universal perceptual rhythmicity in hearing.

      Strengths:

      The study aims to address differences observed in previous studies about perceptual rhythmicity. This is important and timely because the existing literature provides quite inconsistent findings. Several experiments were conducted to assess perceptual rhythmicity in hearing from different angles. The authors use sophisticated approaches to address the research questions.

      Weaknesses:

      (1) Conceptional concerns:

      The authors place their research in the context of a rhythmic mode of perception. They also discuss continuous vs rhythmic mode processing. Their study further follows a design that seems to be based on paradigms that assume a recent phase in neural oscillations that subsequently influence perception (e.g., Fiebelkorn et al.; Landau & Fries). In my view, these are different facets in the neural oscillation research space that require a bit more nuanced separation. Continuous mode processing is associated with vigilance tasks (work by Schroeder and Lakatos; reduction of low frequency oscillations and sustained gamma activity), whereas the authors of this study seem to link it to hearing tasks specifically (e.g., line 694). Rhythmic mode processing is associated with rhythmic stimulation by which neural oscillations entrain and influence perception (also, Schroeder and Lakatos; greater low-frequency fluctuations and more rhythmic gamma activity). The current study mirrors the continuous rather than the rhythmic mode (i.e., there was no rhythmic stimulation), but even the former seems not fully fitting, because trials are 1.8 s short and do not really reflect a vigilance task. Finally, previous paradigms on phase-resetting reflect more closely the design of the current study (i.e., different times of a target stimulus relative to the reset of an oscillation). This is the work by Fiebelkorn et al., Landau & Fries, and others, which do not seem to be cited here, which I find surprising. Moreover, the authors would want to discuss the role of the background noise in resetting the phase of an oscillation, and the role of the fixation cross also possibly resetting the phase of an oscillation. Regardless, the conceptional mixture of all these facets makes interpretations really challenging. The phase-reset nature of the paradigm is not (or not well) explained, and the discussion mixes the different concepts and approaches. I recommend that the authors frame their work more clearly in the context of these different concepts (affecting large portions of the manuscript).

      Indeed, the paradigms used here and in many similar previous studies incorporate an aspect of phase-resetting, as the presentation of a background noisy may effectively reset ongoing auditory cortical processes. Studies trying to probe for rhythmicity in auditory perception in the absence any background noise have not shown any effect (Zoefel and Heil, 2013), perhaps because the necessary rhythmic processes along auditory pathways are only engaged when some sound is present. We now discuss these points, and also acknowledge the mentioned studies in the visual system; l. 57.

      (2) Methodological concerns:

      The authors use a relatively unorthodox approach to statistical testing. I understand that they try to capture and characterize the sensitivity of the different analysis approaches to rhythmic behavioral effects. However, it is a bit unclear what meaningful effects are in the study. For example, the bootstrapping approach that identifies the percentage of significant variations of sample selections is rather descriptive (Figures 5-7). The authors seem to suggest that 50% of the samples are meaningful (given the dashed line in the figure), even though this is rarely reached in any of the analyses. Perhaps >80% of samples should show a significant effect to be meaningful (at least to my subjective mind). To me, the low percentage rather suggests that there is not too much meaningful rhythmicity present. 

      We note that there is no clear consensus on what fraction of experiments should be expected or how this way of quantifying effects should be precisely valued (l. 441ff). However, we now also clearly acknowledge in the discussion that the effective prevalence is not very high (l. 663).

      I suggest that the authors also present more traditional, perhaps multi-level, analyses: Calculation of spectra, binning, or single-trial analysis for each participant and condition, and the respective calculation of the surrogate data analysis, and then comparison of the surrogate data to the original data on the second (participant) level using t-tests. I also thought the statistical approach undertaken here could have been a bit more clearly/didactically described as well.

      We here realize that our description of the methods was possibly not fully clear. We do follow the strategy as suggested by this reviewer, but rather than comparing actual and surrogate data based on a parametric t-test, we compare these based on a non-parametric percentile-based approach. This has the advantage of not making specific (and possibly not-warranted) assumptions about the distribution of the data. We have revised the methods to clarify this, l. 332ff. 

      The authors used an adaptive procedure during the experimental blocks such that the stimulus intensity was adjusted throughout. In practice, this can be a disadvantage relative to keeping the intensity constant throughout, because, on average, correct trials will be associated with a higher intensity than incorrect trials, potentially making observations of perceptual rhythmicity more challenging. The authors would want to discuss this potential issue. Intensity adjustments could perhaps contribute to the observed rhythmicity effects. Perhaps the rhythmicity of the stimulus intensity could be analyzed as well. In any case, the adaptive procedure may add variance to the data.

      We have added an analysis of task difficulty to the results (new section “Effects of adaptive task difficulty“) to address this. Overall we do not find systematic changes in task difficulty across participants for most of the experiments, but for sure one cannot rule out that this aspect of the design also affects the outcomes.  Importantly, we relied on an adaptive task difficulty to actually (or hopefully) reduce variance in the data, by keeping the task-difficulty around a certain level. Give the large number of trials collected, not using such an adaptive produce may result in performance levels around chance or near ceiling, which would make impossible to detect rhythmic variations in behavior. 

      Additional methodological concerns relate to Figure 8. Figures 8A and C seem to indicate that a baseline correction for a very short time window was calculated (I could not find anything about this in the methods section). The data seem very variable and artificially constrained in the baseline time window. It was unclear what the reader might take from Figure 8.

      This figure was intended mostly for illustration of the eye tracking data, but we agree that there is no specific key insight to be taken from this. We removed this. 

      Motivation and discussion of eye-movement/pupillometry and motor activity: The dual task paradigm of Experiment 4 and the reasons for assessing eye metrics in the current study could have been better motivated. The experiment somehow does not fit in very well. There is recent evidence that eye movements decrease during effortful tasks (e.g., Contadini-Wright et al. 2023 J Neurosci; Herrmann & Ryan 2024 J Cog Neurosci), which appears to contradict the results presented in the current study. Moreover, by appealing to active sensing frameworks, the authors suggest that active movements can facilitate listening outcomes (line 677; they should provide a reference for this claim), but it is unclear how this would relate to eye movements. Certainly, a person may move their head closer to a sound source in the presence of competing sound to increase the signal-to-noise ratio, but this is not really the active movements that are measured here. A more detailed discussion may be important. The authors further frame the difference between Experiments 1 and 2 as being related to participants' motor activity. However, there are other factors that could explain differences between experiments. Self-paced trials give participants the opportunity to rest more (inter-trial durations were likely longer in Experiment 2), perhaps affecting attentional engagement. I think a more nuanced discussion may be warranted.

      We expanded the motivation of why self-pacing trials may effectively alter how rhythmic processes affect perception, and now also allude to attention and expectation related effects (l. 786ff). Regarding eye movements we now discuss the results in the light of the previously mentioned studies, but again refrain from a very detailed and mechanistic interpretation (l. 782).

      Discussion:

      The main data in Figure 3 showed little rhythmicity. The authors seem to glance over this fact by simply stating that the same phase is not necessary for their statistical analysis. Previous work, however, showed rhythmicity in the across-participant average (e.g., Fiebelkorn's and similar work). Moreover, one would expect that some of the effects in the low-frequency band (e.g., 2-4 Hz) are somewhat similar across participants. Conduction delays in the auditory system are much smaller than the 0.25-0.5 s associated with 2-4 Hz. The authors would want to discuss why different participants would express so vastly different phases that the across-participant average does not show any rhythmicity, and what this would mean neurophysiologically.

      We now discussion the assumptions and implications of similar or distinct phases of rhythmic processes within and between participants (l. 695ff). In particular we note that different origins of the underlying neurophysiological processes eventually may suggest that such assumptions are or a not warranted.  

      An additional point that may require more nuanced discussion is related to the rhythmicity of response bias versus sensitivity. The authors could discuss what the rhythmicity of these different measures in different frequency bands means, with respect to underlying neural oscillations.

      We expanded discussion to interpret what rhythmic changes in each of the behavioral metric could imply (l. 706ff).

      Figures:

      Much of the text in the figures seems really small. Perhaps the authors would want to ensure it is readable even for those with low vision abilities. Moreover, Figure 1A is not as intuitive as it could be and may perhaps be made clearer. I also suggest the authors discuss a bit more the potential monoaural vs binaural issues, because the perceptual rhythmicity is much slower than any conduction delays in the auditory system that could lead to interference.

      We tried to improve the font sizes where possible, and discuss the potential monaural origins as suggested by other reviewers. 

      Reviewer #3 (Public review):

      Summary:

      The finding of rhythmic activity in the brain has, for a long time, engendered the theory of rhythmic modes of perception, that humans might oscillate between improved and worse perception depending on states of our internal systems. However, experiments looking for such modes have resulted in conflicting findings, particularly in those where the stimulus itself is not rhythmic. This paper seeks to take a comprehensive look at the effect and various experimental parameters which might generate these competing findings: in particular, the presentation of the stimulus to one ear or the other, the relevance of motor involvement, attentional demands, and memory: each of which are revealed to effect the consistency of this rhythmicity.

      The need the paper attempts to resolve is a critical one for the field. However, as presented, I remain unconvinced that the data would not be better interpreted as showing no consistent rhythmic mode effect. It lacks a conceptual framework to understand why effects might be consistent in each ear but at different frequencies and only for some tasks with slight variants, some affecting sensitivity and some affecting bias.

      Strengths:

      The paper is strong in its experimental protocol and its comprehensive analysis, which seeks to compare effects across several analysis types and slight experiment changes to investigate which parameters could affect the presence or absence of an effect of rhythmicity. The prescribed nature of its hypotheses and its manner of setting out to test them is very clear, which allows for a straightforward assessment of its results

      Weaknesses:

      There is a weakness throughout the paper in terms of establishing a conceptual framework both for the source of "rhythmic modes" and for the interpretation of the results. Before understanding the data on this matter, it would be useful to discuss why one would posit such a theory to begin with. From a perceptual side, rhythmic modes of processing in the absence of rhythmic stimuli would not appear to provide any benefit to processing. From a biological or homeostatic argument, it's unclear why we would expect such fluctuations to occur in such a narrow-band way when neither the stimulus nor the neurobiological circuits require it.

      We believe that the framework for why there may be rhythmic activity along auditory pathways that shapes behavioral outcomes has been laid out in many previous studies, prominently here (Schroeder et al., 2008; Schroeder and Lakatos, 2009; Obleser and Kayser, 2019). Many of the relevant studies are cited in the introduction, which is already rather long given the many points covered in this study. 

      Secondly, for the analysis to detect a "rhythmic mode", it must assume that the phase of fluctuations across an experiment (i.e., whether fluctuations are in an up-state or down-state at onset) is constant at stimulus onset, whereas most oscillations do not have such a total phase-reset as a result of input. Therefore, some theoretical positing of what kind of mechanism could generate this fluctuation is critical toward understanding whether the analysis is well-suited to the studied mechanism.

      In line with this and previous comments (by reviewer 2) we have expanded the discussion to consider the issue of phase alignment (l. 695ff). 

      Thirdly, an interpretation of why we should expect left and right ears to have distinct frequency ranges of fluctuations is required. There are a large number of statistical tests in this paper, and it's not clear how multiple comparisons are controlled for, apart from experiment 4 (which specifies B&H false discovery rate). As such, one critical method to identify whether the results are not the result of noise or sample-specific biases is the plausibility of the finding. On its face, maintaining distinct frequencies of perception in each ear does not fit an obvious conceptual framework.

      Again this point was also noted by another reviewer and we expanded the introduction and discussion in this regard (l. 65ff).

      Reviewer #1 (Recommendations for the authors):

      (1) An update of the AR-surrogate method has recently been published (https://doi.org/10.1101/2024.08.22.609278). I appreciate that this is a lot of work, and it is of coursee up to the authors, but given the higher sensitivity of this method, it might be worth applying it to the four datasets described here.

      Reading this article we note that our implementation of the AR-surrogate method was essentially as suggested here, and not as implemented by Brookshire. In fact we had not realized that Brookshire had apparently computed the spectrum based on the group-average data. As explained in the Methods section, as now clarified even better, we compute for each participant the actual spectrum of this participant’s data, and a set of surrogate spectra. We then perform a group-average of both to compute the p-value of the actual group-average based on the percentile of the distribution of surrogate averages. This send step differs from Harris & Beale, which used a one-sided t-test. The latter is most likely not appropriate in a strict statistical sense, but possibly more powerful for detecting true results compared to the percentile-based approach that we used (see l. 332ff).

      (2) When results for the four experiments are reported, a reminder for the reader of how these experiments differ from each other would be useful.

      We have added this in the Results section.

      "considerable prevalence of differences around 4Hz, with dual‐task requirements leading to stronger rhythmicity in perceptual sensitivity". There is a striking similarity to recently published data (https://doi.org/10.1101/2024.08.10.607439 ) demonstrating a 4-Hz rhythm in auditory divided attention (rather than between modalities as in the present case). This could be a useful addition to the paragraph.

      We have added a reference to this preprint, and additional previous work pointing in the same direction mentioned in there.  

      (3) There are two typos in the Introduction: "related by different from the question", and below, there is one "presented" too much.

      These have been fixed.

      Reviewer #3 (Recommendations for the authors):

      My major suggestion is that these results must be replicated in a new sample. I understand this is not simple to do and not always possible, but at this point, no effect is replicated from one experiment to the next, despite very small changes in protocol (especially experiment 1 vs 2). It's therefore very difficult to justify explaining the different effects as real as opposed to random effects of this particular sample. While the bootstrapping effects show the level of consistency of the effect within the sample studied, it can not be a substitute for a true replication of the results in a new sample.

      We agree that only an independent replication can demonstrate the robustness of the results. We do consider experiment 1 a replication test of Ho et al. CurrBiol 2017, which results in different results than reported there. But more importantly, we consider the analysis of ‘reproducibility’ by simulating participant samples a key novelty of the present work, and want to emphasize this over the within-study replication of the same experiment.  In fact, in light of the present interpretation of the data, even a within-study replication would most likely not offer a clear-cut answer. 

      As I said in the public review, the interpretation of the results, and of why perceptual cycles in arhythmic stimuli could be a plausible theory to begin with, is lacking. A conceptual framework would vastly improve the impact and understanding of the results.

      We tried to strengthen the conceptual framework in the introduction. We believe that this is in large provided by previous work, and the aim of the present study was to explore the robustness of effects and not to suggest and discover novel effects. 

      Minor comments:

      (1) The authors adapt the difficulty as a function of performance, which seems to me a strange choice for an experiment that is analyzing the differences in performance across the experiment. Could you add a sentence to discuss the motivation for this choice?

      We now mention the rationale in the Methods section and in a new section of the Results. There we also provide additional analyses on this parameter.

      (2) The choice to plot the p-values as opposed to the values of the actual analysis feels ill-advised to me. It invites comparison across analyses that isn't necessarily fair. It would be more informative to plot the respective analysis outputs (spectral power, regression, or delta R2) and highlight the windows of significance and their overlap across analyses. In my opinion, this would be more fair and accurate depiction of the analyses as they are meant to be used.

      We do disagree. As explained in the Methods (l. 374ff): “(Showing p-values) … allows presenting the results on a scale that can be directly compared between analysis approaches, metrics, frequencies and analyses focusing on individual ears or the combined data. Each approach has a different statistical sensitivity, and the underlying effect sizes (e.g. spectral power) vary with frequency for both the actual data and null distribution. As a result, the effect size reaching statistical significance varies with frequency, metrics and analyses.” 

      The fact that the level of power (or R2 or whatever metric we consider) required to reach significance differs between analyses (one ear, both ears), metrics (d-prime, bias, RT) and between analyses approaches makes showing the results difficult, as we would need a separate panel for each of those. This would multiply the number of panels required e.g. for Figure 4 by 3, making it a figure with 81 axes. Also neither the original quantities of each analysis (e.g. spectral power) nor the p-values that we show constitute a proper measure of effect size in a statistical sense. In that sense, neither of these is truly ideal for comparing between analyses, metrics etc. 

      We do agree thought that many readers may want to see the original quantification and thresholds for statistical significance. We now show these in an exemplary manner for the Binned analysis of Experiment 1, which provides a positive result and also is an attempt to replicate the findings by  Ho et al 2017. This is shown in new Figure 5. 

      (3) Typo in line 555 (+ should be plus minus).

      (4) Typo in line 572: "Comparison of 572 blocks with minus dual task those without"

      (5) Typo in line 616: remove "one".

      (6) Line 666 refers to effects in alpha band activity, but it's unclear what the relationship is to the authors' findings, which peak around 6 Hz, lower than alpha (~10 Hz).

      (7) Line 688 typo, remove "amount of".

      These points have been addressed.  

      (8) Oculomotor effect that drives greater rhythmicity at 3-4 Hz. Did the authors analyze the eye movements to see if saccades were also occurring at this rate? It would be useful to know if the 3-4 Hz effect is driven by "internal circuitry" in the auditory system or by the typical rate of eye movement.

      A preliminary analysis of eye movement data was in previous Figure 8, which was removed on the recommendation of another review.  This showed that the average saccade rate is about 0.01 saccade /per trial per time bin, amounting to on average less than one detected saccade per trial. Hence rhythmicity in saccades is unlikely to explain rhythmicity in behavioral data at the scale of 34Hz. We now note this in the Results.

      Obleser J, Kayser C (2019) Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 23:913-926.

      Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9-18.

      Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106-113.

      Zoefel B, Heil P (2013) Detection of Near-Threshold Sounds is Independent of EEG Phase in Common Frequency Bands. Front Psychol 4:262.

    1. Although lying in a pool of his blood with brain matter emerging from his head, Gage was conscious and able to get up, walk, and speak. But in the months following his accident, people noticed that his personality had changed

      After the accident, Gage's personality changed. he became impulsive and had trouble controlling emotions.

    2. The temporal lobe is located on the side of the head (temporal means “near the temples”), and is associated with hearing, memory, emotion, and some aspects of language.

      Hearing,memory, and understanding language

    1. Reviewer #3 (Public review):

      Summary

      The authors set out to explore the potential relationship between adult neurogenesis of inhibitory granule cells in the olfactory bulb and cumulative changes over days in odor-evoked spiking activity (representational drift) in the olfactory stream. They developed a richly detailed spiking neuronal network model based on Izhikevich (2003), allowing them to capture the diversity of spiking behaviors of multiple neuron types within the olfactory system. This model recapitulates the circuit organization of both the main olfactory bulb (MOB) and the piriform cortex (PCx), including connections between the two (both feedforward and corticofugal). Adult neurogenesis was captured by shuffling the weights of the model's granule cells, preserving the distribution of synaptic weights. Shuffling of granule cell connectivity resulted in cumulative changes in stimulus-evoked spiking of the model's M/T cells. Individual M/T cell tuning changed with time, and ensemble correlations dropped sharply over the temporal interval examined (long enough that almost all granule cells in the model had shuffled their weights). Interestingly, these changes in responsiveness did not disrupt low-dimensional stability of olfactory representations: when projected into a low-dimensional subspace, population vector correlations in this subspace remained elevated across the temporal interval examined. Importantly, in the model's downstream piriform layer, this was not the case. There, shuffled GC connectivity in the bulb resulted in a complete shift in piriform odor coding, including for low-dimensional projections. This is in contrast to what the model exhibited in the M/T input layer. Interestingly, these changes in PCx extended to the geometrical structure of the odor representations themselves. Finally, the authors examined the effect of experience on representational drift. Using an STDP rule, they allowed the inputs to and outputs from adult-born granule cells to change during repeated presentations of the same odor. This stabilized stimulus-evoked activity in the model's piriform layer.

      Strengths

      This paper suggests a link between adult neurogenesis in the olfactory bulb and representational drift in the piriform cortex. Using an elegant spiking network that faithfully recapitulates the basic physiological properties of the olfactory stream, the authors tackle a question of longstanding interest in a creative and interesting manner. As a purely theoretical study of drift, this paper presents important insights: synaptic turnover of recurrent inhibitory input can destabilize stimulus-evoked activity, but only to a degree, as representations in the bulb (the model's recurrent input layer) retain their basic geometrical form. However, this destabilized input results in profound drift in the model's second (piriform) layer, where both the tuning of individual neurons and the layer's overall functional geometry are restructured. This is a useful and important idea in the drift field, and to my knowledge, it is novel. The bulb is not the only setting where inhibitory synapses exhibit turnover (whether through neurogenesis or synaptic dynamics), and so this exploration of the consequences of such plasticity on drift is valuable. The authors also elegantly explore a potential mechanism to stabilize representations through experience, using an STDP rule specific to the inhibitory neurons in the input layer. This has an interesting parallel with other recent theoretical work on drift in the piriform (Morales et al., 2025 PNAS), in which STDP in the piriform layer was also shown to stabilize stimulus representations there. It is fascinating to see that this same rule also stabilizes piriform representations when implemented in the bulb's granule cells.

      The authors also provide a thoughtful discussion regarding the differential roles of mitral and tufted cells in drift in piriform and AON and the potential roles of neurogenesis in archicortex.

      In general, this paper puts an important and much-needed spotlight on the role of neurogenesis and inhibitory plasticity in drift. In this light, it is a valuable and exciting contribution to the drift conversation.

      Weaknesses

      I have one major, general concern that I think must be addressed to permit proper interpretation of the results.

      I worry that the authors' model may confuse thinking on drift in the olfactory system, because of differences in the behavior of their model from known features of the olfactory bulb. In their model, the tuning of individual bulbar neurons drifts over time. This is inconsistent with the experimental literature on the stability of odor-evoked activity in the olfactory bulb.

      In a foundational paper, Bhalla & Bower (1997) recorded from mitral and tufted cells in the olfactory bulb of freely moving rats and measured the odor tuning of well-isolated single units across a five-day interval. They found that the tuning of a single cell was quite variable within a day, across trials, but that this variability did not increase with time. Indeed, their measure of response similarity was equivalent within and across days. In what now reads as a prescient anticipation of the drift phenomenon, Bhalla and Bower concluded: "it is clear, at least over five days, that the cell is bounded in how it can respond. If this were not the case, we would expect a continual increase in relative response variability over multiple days (the equivalent of response drift). Instead, the degree of variability in the responses of single cells is stable over the length of time we have recorded." Thus, even at the level of single cells, this early paper argues that the bulb is stable.

      This basic result has since been replicated by several groups. Kato et al. (2012) used chronic two-photon calcium imaging of mitral cells in awake, head-fixed mice and likewise found that, while odor responses could be modulated by recent experience (odor exposure leading to transient adaptation), the underlying tuning of individual cells remained stable. While experience altered mitral cell odor responses, those responses recovered to their original form at the level of the single neuron, maintaining tuning over extended periods (two months). More recently, the Mizrahi lab (Shani-Narkiss et al., 2023) extended chronic imaging to six months, reporting that single-cell odor tuning curves remained highly similar over this period. These studies reinforce Bhalla and Bower's original conclusion: despite trial-to-trial variability, olfactory bulb neurons maintain stable odor tuning across extended timescales, with plasticity emerging primarily in response to experience. (The Yamada et al., 2017 paper, which the authors here cite, is not an appropriate comparison. In Yamada, mice were exposed daily to odor. Therefore, the changes observed in Yamada are a function of odor experience, not of time alone. Yamada does not include data in which the tuning of bulb neurons is measured in the absence of intervening experience.)

      Therefore, a model that relies on instability in the tuning of bulbar neurons risks giving the incorrect impression that the bulb drifts over time. This difference should be explicitly addressed by the authors to avoid any potential confusion. Perhaps the best course of action would be to fit their model to Mizrahi's data, should this data be available, and see if, when constrained by empirical observation, the model still produces drift in piriform. If so, this would dramatically strengthen the paper. If this is not feasible, then I suggest being very explicit about this difference between the behavior of the model and what has been shown empirically. I appreciate that in the data there is modest drift (e.g., Shani-Narkiss' Figure 8C), but the changes reported there really are modest compared to what is exhibited by the model. A compromise would be to simply apply these metrics to the model and match the model's similarity to the Shani-Narkiss data. Then the authors could ask what effect this has on drift in piriform.

      The risk here is that people will conclude from this paper that drift in piriform may simply be inherited from instability in the bulb. This view is inconsistent with what has been documented empirically, and so great care is warranted to avoid conveying that impression to the community.

      Major comments (all related to the above point)

      (1) Lines 146-168: The authors find in their model that "individual M/T cells changed their responses to the same odor across days due to adult-neurogenesis, with some cells decreasing the firing rate responses (Fig.2A1 top) while other cells increased the magnitude of their responses (Fig. 2A2 bottom, Fig. S2)" they also report a significant decrease in the "full ensemble correlation" in their model over time. They claim that these changes in individual cell tuning are "similar to what has been observed by others using calcium imaging of M/T cell activity (Kato et al., 2012 and Yamada et al., 2017)" and that the decrease in full ensemble correlation is "consistent with experimental observations (Yamada et al., 2017)." However, the conditions of the Kato and Yamada experiments that demonstrate response change are not comparable here, as odors were presented daily to the animals in these experiments. Therefore, the changes in odor tuning found in the Kato and Yamada papers (Kato Figure 4D; Yamada Figure 3E) are a function of accumulated experience with odor. This distinction is crucial because experience-induced changes reflect an underlying learning process, whereas changes that simply accumulate over time are more consistent with drift. The conditions of their model are more similar to those employed in other experiments described in Kato et al. 2012 (Figure 6C) as well as Shani-Narkiss et al. (2023), in which bulb tuning is measured not as a function of intervening experience, but rather as a function of time (Kato's "recovery" experiment). What is found in Kato is that even across two months, the tuning of individual mitral cells is stable. What alters tuning is experience with odor, the core finding of both the Kato et al., 2012 paper and also Yamada et al., 2017. It is crucial that this is clarified in the text.

      (2) The authors show that in a reduced-space correlation metric, the correlation of low-dimensional trajectories "remained high across all days"..."consistent with a recent experimental study" (Shani-Narkiss et al., 2023). It is true that in the Shani-Narkiss paper, a consistent low-dimensional response is found across days (t-SNE analysis in Shani-Narkiss Figure 7B). However, the key difference between the Shani-Narkiss data and the results reported here is that Shani-Narkiss also observed relative stability in the native space (Shani-Narkiss Figure 8). They conclude that they "find a relatively stable response of single neurons to odors in either awake or anesthetized states and a relatively stable representation of odors by the MC population as a whole (Figures 6-8; Bhalla and Bower, 1997)." This should be better clarified in the text.

      (3) In the discussion, the authors state that "In the MOB, individual M/T cells exhibited variable odor responses akin to gain control, altering their firing rate magnitudes over time. This is consistent with earlier experimental studies using calcium-imaging." (L314-6). Again, I disagree that these data are consistent with what has been published thus far. Changes in gain would have resulted in increased variability across days in the Bhalla data. Moreover, changes in gain would be captured by Kato's change index ("To quantify the changes in mitral cell responses, we calculated the change index (CI) for each responsive mitral cell-odor pair on each trial (trial X) of a given day as (response on trial X - the initial response on day 1)/(response on trial X + the initial response on day 1). Thus, CI ranges from −1 to 1, where a value of −1 represents a complete loss of response, 1 represents the emergence of a new response, and 0 represents no change." Kato et al.). This index will capture changes in gain. However, as shown in Figure 4D (red traces), Figure 6C (Recovery and Odor set B during odor set A experience and vice versa), the change index is either zero or near zero. If the authors wish to claim that their model is consistent with these data, they should also compute Kato's change index for M/T odor-cell pairs in their model and show that it also remains at 0 over time, absent experience.

    1. as seen with drug overdoses, head injuries, or neurological diseases.

      Chest trauma or pulmonary complications is not the only reason for inadequate breathing .

    1. Panicked selling set in as thousands of positions were closed to cover margins. Stock values sank to sudden lows, and stunned investors crowded the New York Stock Exchange demanding answers.

      I think panic selling and buying is so common in the world today because of the fact that when we had COVID everyone went nutso in the head and bulk bought all the tp in stores and a bunch of other items.

    1. We send verbal and nonverbal feedback while another person is talking and after they are done. Back-channel cues are the verbal and nonverbal signals we send while someone is talking and can consist of verbal cues like “uh-huh,” “oh,” and “right,” and/or nonverbal cues like direct eye contact, head nods, and leaning forward. Back-channel cues are generally a form of positive feedback that indicates others are actively listening. People also send cues intentionally and unintentionally that indicate they aren’t listening. If another person is looking away, fidgeting, texting, or turned away, we will likely interpret those responses negatively.

      Some people who have certain neurodivergences can have issues recognizing the need for feedback during the listening stages of a conversation, I know I personally struggle with this often.

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript reports all-atom molecular dynamics simulations on the outer membrane of Mycobacterium tuberculosis. This is the first all-atom MD simulation of the MTb outer membrane and complements the earlier studies, which used coarse-grained simulation.

      Strengths:

      The simulation of the outer membrane consisting of heterogeneous lipids is a challenging task, and the current work is technically very sound.

      The observation about membrane heterogeneity and ordered inner leaflets vs disordered outer leaflets is a novel result from the study. This work will also facilitate other groups to work on all-atom models of mycobacterial outer membrane for drug transport, etc.

      Weaknesses:

      Beyond a challenging simulation study, the current manuscript only provides qualitative explanations on the unusual membrane structure of MTb and does not demonstrate any practical utility of the all-atom membrane simulation. It will be difficult for the general biology community to appreciate the significance of the work, based on the manuscript in its current form, because of the high content of technical details and limited evidence on the utility of the work.

      Major Points:

      (1) The simulation by Basu et al (Phys Chem Chem Phys 2024) has studied drug transports through mycolic acid monolayers. Since the authors of the current study have all atom models of MTb outer membrane, they should carry out drug transport simulations and compare them to the outer membranes of other bacteria through which drugs can permeate. In the current manuscript, it is only discussed in lines 388-392. Can the disruption of MA cyclopropanation be simulated to show its effect on membrane structure ?

      (2) In line 277, the authors mention about 6 simulations which mimic lipid knockout strains. The results of these simulations, specifically the outcomes of in silico knockout of lipids, are not described in detail.

      (3) Figure 5 shows PDIM and PAT-driven lipid redistribution, which is a significant novel observation from the study. However, comparison of 3B and 3D shows that at 313K, the movement of the PDIM head group is much less. Since MD simulations are sensitive to random initial seeds, repeated simulations with different random seeds and initial structures may be necessary.

      (4) As per Figure 1, in the initial structure, the head group of PAT should be on the membrane surface, similar to TDM and TMM, while PDIM is placed towardsthe interior of the outer membrane. However, Figure 5 shows that at t=0, PAT has the same Z position as PDIM. It will be necessary to provide Z-position Figures for TMM and TDM to understand the difference. Is it really dependent on the chemical structure of the lipid moiety or the initial position of the lipid in the bilayer at the beginning of the simulation?

      Minor Point:

      In view of the complexity of the system undertaken for the study, the manuscript in its current form may not be informative for readers who are not experts in molecular simulations.

    1. Sometimes life’s gonna hit you in the head with a brick.

      This line makes me think of all the times I’ve faced unexpected challenges. It’s kind of comforting to know that even someone as successful as Jobs experienced moments that felt like a brick to the head.

  3. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. The paradox stems from the fact that the success of one generation depends at least partly on the success of their parents or guardians. People who succeed get to keep the fruits of their labor and use them as they see fit; if they buy a home in a place where the schools are better, or use their superior resources to make the schools in their neighborhood better, their chil-dren will have a head start and other children will fall behind through no fault of their own

      In my country, the government allocates school resources more centrally to reduce such gaps, so seeing this paradox in action makes me realize how deeply rooted it is in the U.S. system.

    2. This paradox reveals the inherent contradictions of the education system—it is expected to serve as an equalizer, yet it also becomes a tool for class reproduction. The author uses the metaphor of a "head start" to vividly illustrate the mechanism of intergenerational privilege transmission, echoing French sociologist Pierre Bourdieu's theory of "cultural capital" (Bourdieu, 1986). This structural analysis is particularly valuable for understanding educational inequality. This reading changed my understanding of educational equity, shifting it from a "resource allocation problem" to a "double dilemma of system and culture." I also realized that the "American Dream" ideology could be transformed into a "noble lie."

    1. Tools of the typewriter trade by [[Retrotype]]

      Excellent overview of many of the basic tools for typewriter repair. Didn't have the strongest grasp of all the tools' specific names, but good enough for describing their general use cases.

      Example of a typewriter toolset including a case made for telephone company repair, but which works with typewriters.

      • Shore A durometer gauge 2:22
      • nylon fishing/picture hanging wire spec to 25kg (for drawband replacement)
      • thick waxed string/yarn for repairing fishing nets (for drawbands)
      • nitrile gloves (to prevent staining, issues with mineral spirits, and other caustic chemicals)
      • XPower pressure blower for blowing out dust/dirt and mineral spirits. (smaller than an air compressor)
      • nail grooming set with tweezers, picks, etc. (not technically necessary, but sometimes useful)
      • dental tools (for use as spring hooks)
      • Renaissance micro-crystalline wax (non-corrosive, made for British Museum, good on marble, wood, leather, etc. Good on bare metal for treating previously rusted metal. (It's recommended to use an abrasive polish for improving the shine of glossy paint however)
      • Pouch and set of precision screwdrivers (he only uses the flatheads though the set includes other) Prefer hollow ground tips which are squared off rather than wedges.
      • Chapman bit set of screwdrivers (with hollow ground tips) He prefers these for hard to remove screws. Issue that it's a bit thicker at the tip.
      • Liquid wrench penetrating oil for helping to loosen screws (he likes this better than WD-40)
      • brash wire brushes
      • steel wire brushes (uses less frequently as they're more abrasive)
      • pouch of precision wrenches (imperial and metric) his are bladed, Moody tools wrenches (mfg.) prefer the thinnest tips
      • microfiber cloths
      • jig for soldering typeslugs on typearms
      • pouch with various typewriter specific pliers:
        • 3 prong pliers (total of 9 prongs) for making bends/forming typebars (especially making bends in the middle of bars rather than the end.;
        • peening bend pliers;
        • bending pliers for sideways bends esp. with thinner typebars;
        • vertical adjustment pliers (with rollers) not good for making adjustments of 3mm or more;
        • forming pliers with screws on the end to rotate heads for bending, peening and cutting;
        • peening pliers (bending by metal displacement)
      • Magnetized screwdrivers
      • forceps
      • screw grabber (active capture)
      • spring hooks (push/pull)
      • nylon brushes for dusting
      • needle nose pliers
      • t-bender with slotted head for forming metal
      • small bottles for mineral spirits and sewing machine oil. They have small metal tips for precision application.
    1. Thursday 7th of June 1804 Set out early passed the head of the Isd from the Isd. N. 61° W. to the mouth of a Creek Called big monitu on St. Sd. 41/2 ms. psd. a Sand bar in the river, Som Buffalow Sign Sent out George Drewyer & Newmon to hunt Capt Lewis and 6 men went to a Lick up this Creek on the right Side over 2 mes. & 2 other not far above the water runs out of the bank & not verry Strong. 3 to 500 G for a bushell.

      Observation: They left early, went past Big Monitu Creek, saw buffalo tracks, and Lewis checked out a salt spring with some men.

      Interpretation: This shows they were looking at animals and natural things like salt while they traveled.

      Connection: It links to how the trip was about learning what the land had, not just moving through it.

      I learned that Lewis and his team studied everything around them, like animals and natural resources such as salt. This adds to my connection because it shows the expedition was about discovery and learning what the new land could offer. It’s important because their findings helped the U.S. understand the land’s value, resources, and how people could live there. It shows how exploration helped the country grow and use its new land wisely.

      Context: In 1804, the U.S. had just bought this land in the Louisiana Purchase. People didn’t know what was there, so the expedition was sent to study the land, animals, and resources.

    2. Capt. Lewis took meridean altd. of Suns U. L. with the octant above Split Rock C. &made the altitude 37° 6′ 00 error of octt. as useal 2° 0′ 0″ + The Countrey for Several miles below is good, on the top of the high land back is also tolerable land Some buffalow Sign to day I am Still verry unwell with a Sore throat & head ake

      Observation: Lewis measured the sun, noticed traces of buffalo, said the land was pretty good, and wrote that he was sick with a sore throat and headache.

      Interpretation: This shows they were still studying the land and sky, even when they felt sick.

      Connection: It ties to how the trip was about learning and exploring, not just traveling.

      I learned that Lewis kept studying the land and animals even when he was sick. This adds to my connection because it shows how hard he worked to help the U.S. learn about new places. It’s important because his notes and measurements taught people what the land was like and what could be found there.

      Context: In 1804, the U.S. had just bought this land. The trip was meant to find out what was there like animals, land, and resources that people back east didn’t know about yet.

    3. Set out early passed the head of the Island opposit which we Camped last night, and brackfast at the Mouth of a large Creek on the S. S. Of 30 yds wide Called big Monetou, from the pt. of the Isd. or Course of last night to the mouth of this Creek is N 61° W 41/2 ms. a Short distance above the mouth of this Creek, is Several Courious Paintings and Carveing in the projecting rock of Limestone inlade with white red & blue flint, of a verry good quallity, the Indians have taken of this flint great quantities

      I observe that the journal talks about how Lewis and Clark and the people who accompanied them had decided to set out early and passed the front of the island opposite to where they had camped last night, they then had breakfast on the mouth of a large creek on the south side. They had set their course to being north west and a short distance from where they were at the mouth of the creek there were paintings and carving in the limestone rock with white red and blue flint of good quality. In which the native Americans had taken a great deal of. I interpret this as a way for the author to make a detailed entry and record of their experiences and by giving specific land marks and other details like their course the author is able to make it easier for someone to retrace their steps if they were to. I can connect this to the tertiary source in how it talks about how the main objectives of this journey were to gain geographic and scientific information on the land and its plant and animal life. A second goal was also in the diplomatic and commercial interest of the United States who wanted to find a water route to the pacific in order to gain a stronger position in the fur trade. This is seen in the highlighted section in how the author notes how the land is by noting the creek and the landmark of the paintings on the rock. Context: Jefferson had wanted to expand the united states and compete with Canada in the fur trade. He had wanted to do so earlier than the expedition of Lewis and Clark and attempted so before the Louisiana purchase in 1803. However the expedition came to fruition following the purchase in which Lewis and Clark along with their group were tasked with exploring the vast recently purchased land.

  4. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. or Lola and Sofia, education was initially a rewarding expenence. Their grandmother arranged for each of them to attend Head Start, and both girls have fond memories of elementary school. "It was really fun," Lola recalls. "I liked my first-grade teacher, Mrs. Garcia. She was really nice and caring. She was cool." Sofia recalls her experiences the same way. "The teachers actually cared," she says. "The schools I went to were good. I really did like school, to be honest with you." Sofia seems to have been a preco-cious student-smart, motivated, and selected for a gifted-and-talented program. "She was a weirdo," Lola says, teasing her. "She liked reading the dictionary." "I did," Sofia admits. "I enjoyed reading the dictionary. It was l "

      The Head Start program, a federal government early childhood education intervention for low-income families, demonstrated short-term success in these children—cultivating their interest in learning and enabling Sofia to demonstrate her talents (e.g., being selected for gifted programs). This confirms research showing the positive effects of early intervention on the cognitive and social-emotional development of disadvantaged children. However, the lack of continuity in this intervention and the absence of a follow-up support system made it difficult to maintain these early advantages.

    1. Externalize often. The more you express those ideas—in words, in sketches, in prototypes, in demos—the more visible those flaws will be to you and other people. There’s a reason that Leonardo da Vinci kept a notebook in which he sketched and wrote every idea he had: it allowed him to see those ideas, share those ideas, critique those ideas, and improve those ideas. Had he kept them all in his head, his limited capacity to see and reason about those ideas would have greatly limited his productivity.

      I agree becasue I believe that it is easy to think of a vague, rough idea idea in your mind; but to perfect, visualize and externalize it is another thing. If we hesitate to record our idea right away, there is a good chance we will lose it. For a instance, when I start a essay, I found making a outline that lists all my thoughts would be helpful. I am sure it will be helpful to all of us if even Leonardo is doing it.

    2. How do you figure out what’s wrong with those bad ideas? Externalize often. The more you express those ideas—in words, in sketches, in prototypes, in demos—the more visible those flaws will be to you and other people. There’s a reason that Leonardo da Vinci kept a notebook in which he sketched and wrote every idea he had: it allowed him to see those ideas, share those ideas, critique those ideas, and improve those ideas. Had he kept them all in his head, his limited capacity to see and reason about those ideas would have greatly limited his productivity.

      I really like this section and think the idea of externalizing your ideas is super useful. I've noticed that when I sketch something out or explain it to someone else, I can spot the flaws way more easily than if I just keep it in my head. The Leonardo da Vinci example makes a lot of sense too as it shows even really smart people need a way to organize their thoughts. It's making me realize I should probably write down or sketch my ideas more often instead of trying to remember everything.

    3. Externalize often. The more you express those ideas—in words, in sketches, in prototypes, in demos—the more visible those flaws will be to you and other people. There’s a reason that Leonardo da Vinci kept a notebook in which he sketched and wrote every idea he had: it allowed him to see those ideas, share those ideas, critique those ideas, and improve those ideas. Had he kept them all in his head, his limited capacity to see and reason about those ideas would have greatly limited his productivity.

      I really like how this section connects creativity to the act of expressing ideas instead of just thinking about them. I agree that externalizing thoughts makes it way easier to catch flaws. Whenever I try to hold everything in my head, I lose track of details or overestimate the quality of my idea. It’s also kind of motivating to think that even someone like da Vinci needed to write everything down to make sense of it.

    1. Hemostats snap over the arteries of the scalp. Blood spatters onto Dr. Ducker’s sterile paper booties.

      This sentence alone sends shivers down my spine. The concept of cutting into someone's head is enough to freak a reader out, but words like "snap" and "spatters" make it even more uncomfortable.

    1. ______________________________________

      I would definitely recommend getting in the right head space and limiting distraction! Some things to help would to be to go into a quiet room, without your phone, and breathe

    1. I love that we are discussing fanfiction in relation to literature and communication. Often, people diminish fanfiction's legitimacy because of the demographics it is written by and the fact that it is often beginner writers (so it is less 'sophisticated') but it has value in our culture and many others. It displays a lot of the development of language and communication. Sorry, I'm discussing this in another class so it just.. in my head rn.

    1. Reviewer #3 (Public review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high frequency regions of primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and report that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors definitions of neuronal response type in the methods needs more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels." The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      Comments on revisions:

      I think the authors misunderstood my point about sound level and characteristic frequency vs best frequency tonotopic maps. Yes, many studies of cortical responses present stimuli at higher intensities than the characteristic frequencies, but as tuning curves widen with sound level, the macroscopic tonotopic organization of primary auditory cortex breaks down at higher intensities. This is why most of the classic studies of tonotopy e.g., from the Merzenich lab) generated maps of characteristic frequency. As I mentioned before, this isn't so much of an issue for the authors' comparisons of TR vs CT organization in their own study, but in general, this makes it difficult to compare aspects of spatially-organized tonotopy from imaging studies with the older electrophysiological 'truer' tonotopic maps. That said, this means that CT cells also might be tonotopically organized if the authors had been able to look at lower intensity tuning properties.

    1. Besides showing insensitivity to their target, caricatures like Muhammad with a bomb instead of turban on his head contribute to entrenching the mindless Islamophobia that sees all Muslims as enemies of the West and its freedoms. Not a wise move if one is concerned either with integrating immigrants from the Maghreb in French society, or with avoiding the “clash of civilizations” which Islamophobes seem so eager to bring on.

      should not be branded muslims are religious freaks especiallt when trying to assimilate said muslims

    2. In the French context as well as all over Europe, we have witnessed in the last 20 years an increasing political resistance against the practices of Islam and their visibility in public spaces: from the ban on hijab (head covering) and niqab (full face covering) to the limitations on mosque-building, halal slaughtering, and even circumcision. Muslims have the feeling that being or looking like a practitioner of the Islamic faith will ostracize them, not to mention that this hostility goes hand in hand with concrete discriminations against the practice of the religion: women barred from entering public buildings because they wear hijabs, discrimination on the job market, in the workplace, etc.

      can fit within the orientalist context of trying to modernize those who are stuck with traditions and not with the modern west of secularism

    1. What would it take for you to move to the mountains? MountainBlog Annina UZH Tuesday, 28 January 2025 8426 Hits 0 Comments Written by Tamar Kutubidze, Nini Lagvilava, Sonja Lussi & Charlene ZehnderA collaboration between students from Tbilisi State University and the University of Zurich Imagine a serene village nestled in the Swiss Alps, with breathtaking views and quiet streets that seem straight out of a storybook. Now, imagine this village isn't just a fairytale, it is a place willing to pay you to call it home. Welcome to Albinen, a small village in the Valais mountains of Switzerland. Perched 1'300 meters above sea level, Albinen has only 240 residents (SWI swissinfo, 2017). In 2017, facing a bleak future, Albinen took a bold step. The plan? Offer monetary incentives to attract new residents. To qualify, applicants needed to be under 45, commit to staying at least 10 years, and invest 200'000 Swiss Francs in property development (Siebrecht, 2017).Fast forward to seven years later: has the plan worked? Albinen's goal was modest, to attract five families in five years, with the hope of ten families in ten years. By 2022, the initiative looked promising on paper. Albinen approved 17 applications, supported 31 adults and 16 children, and spent CHF 710'000. However, the head of the municipality remains unconvinced (Lynch 2023). Despite the program's success in applications, Albinen's population dropped from 273 to 262 between 2017-2023 (Metry 2024). Infrastructure challenges remain a significant issue, and integration has been slow. A local of Albinen reported that newly arrived residents are rarely seen in the village (Lynch 2023), sparking concerns that they might view Albinen as a second-home destination rather than a permanent community. This leads us to ask: are these newcomers committed to revitalizing Albinen, or are they simply seeking a picturesque retreat? Svaneti, Georgia. (Image source: https://www.caucasus-trekking.com/regions/svaneti) Albinen, Switzerland. (Image source: https://www.borghisvizzera.ch/de/scheda/albinen) Depopulation of mountainous regions isn't unique to Albinen. It's also a challenge in Georgia's Caucasus Mountains, where issues like limited infrastructure, rural economies, and poor connectivity drive people to seek better opportunities in the lowlands (Telbisz, et al., 2020). The Georgian government addresses this by offering financial aid, agricultural subsidies, and housing support in remote areas. In regions like Svaneti and Tusheti, eco-tourism initiatives are combined with efforts to encourage permanent settlement. Mountain regions in both countries, Georgia, and Switzerland, therefore, face similar issues with depopulation. Almost a quarter of the population lives in the Alps, yet many mountain villages are seeing dwindling numbers (Alpenkonvention, 2015). While the approaches differ, both countries share the same goal: revitalization. Albinen's initiative drew international media attention and still receives up to 100 applications daily from Germany, Austria, Croatia, Sri Lanka, Mexico, and Brazil (Hess 2017). The problem: the press omitted key details, giving people from around the world false hope for a better life in Switzerland. Most applications fail to meet the requirements, creating unnecessary work for the municipality (Lynch 2023). While Albinen achieved its target of attracting families, its deeper goal of transforming into a thriving, cohesive community remains elusive.Research suggests that successful revitalization initiatives require more than financial incentives. They need robust infrastructure, opportunities for community engagement, and long-term planning (Telbisz et al., 2020). In Georgia, the stakes are high. Mountain villages are more than homes; they are living monuments to ancient traditions, music, and architecture. Revitalizing these areas could preserve a unique cultural heritage while supporting ecological sustainability. However, achieving this requires a balanced approach that ensures both integration and sustainable development. With the right strategies, Georgia's mountain villages could thrive again as vibrant, self-sustaining communities.So, what would it take for you to move to the mountains? Would breathtaking views and monetary incentives be enough, or does it take something deeper, like a sense of belonging? The examples of Albinen, Svaneti and Tusheti offer no easy solutions but invite us to reflect on what truly makes a place feel like home.

      მოცემული ბლოგი განხილავს და მაქსიმალური სიზუსტით აღწერს მაღალმთიან რეგიონებში არსებულ ყველაზე რთულ და აქტუალურ პრობლემას-ტერიტორიის მოსახლებისგან დაცლას. ბლოგის ავტორები გვაცნობენ შვეიცარიის ალპურ ზონაში არსებულ პატარა სოფელ ალბინენს. მოსახლეოობის სიმწირის პრობლემის აღმოსაფხვრელად სახელმწიფო იძულებული გახდა შეემუშავებინა ახალი ფინანსური დახმარების პროექტი, რომელსაც მათი აზრით უნდა გაეზარდა მოსახლეობის დაინტერესება და მოტივაცია ეცხოვრათ და კვლავ შეეტანათ ახალი სიცოცხლე მაღალმთიან რეგიონში. ინიციატივის შედეგებმა (სოფლის მოსახლეობა შემცირდა 273დან 262მდე) ცხადყო, რომ მხოლოდ ფინანსური წახალისება არაა საკმარისი ისეთ პირობებში საცხოვრებლად სადაც მწირია ინფრასტრუქტურული, სოციალური, კულტურული განვითარებისა და უზრუნველყოფის შანსები. ბლოგი პარალელეს ავლებს საქართველოს მაღალმთიან რეგიონებთან-სვანეთთან და თუშეთთან, სადაც ანალოგიური პრობლემები დიდი ხანია არსებობს. სოფლები იცლება ეკონომიკური განვითარების არარსებობის გამო. საქართველოც ანალოგიურად ცდილობს რეგიონის გაძლიერებას ფინანსური დახმარებებით, ეკოტურიზმის განვითარებით, თუმცა პროცესი შეუქცევადია, საქართველოს მთიანი რეგიონები ნელ-ნელა იცლება მოსახლეობისგან. ბლოგზე დაყრდნობით შეგვიძლია დავასკვნათ რომ მსგავსი პრობლემების აღმოფხვრა შეუძლებელია მხოლოდ ფინანსური სტიმულებით. აუცილებელია ინფრასტრუქტურის განვითარება, სოციალური ცხოვრების გაუმჯობესება, თუნდაც იმისთვის, რომ ახალი მოსახლეობა მარტივად ინტეგრირდეს გარემოსთან, ისეთი პროცესების უზრუნველყოფა, რაც გაზრდის კულტურულად და სოციალურად აქტიური ცხოვრების არსებობის შესაძლებლობას.

    1. White leaders, all 83 percent of them as the statistic goes, are still refusing to defer to the leadership of people of color, even when their clients are predominantly people of color. Some might compare white nonprofit CEOs to slave masters who considered themselves “good,” only looking after the best interests of the plantation by overseeing labor and resources.

      This was thought provoking to say the least. When considering diversity at the workplace, I have learned to view cultural competency as a necessary standard. I struggle to wrap my head around not seeing the benefit of having people of color in leadership positions at organizations that serve that demographic.

    1. eLife Assessment

      This important study provides solid evidence for new insights into the role of Type-1 nNOS interneurons in driving neuronal network activity and controlling vascular network dynamics in awake, head-fixed mice. The authors use an original strategy based on the ablation of Type-1 nNOS interneurons with local injection of saporin conjugated to a substance P analogue into the somatosensory cortex. They show that ablation of type I nNOS neurons has surprisingly little effect on neurovascular coupling, although it alters neural activity and vascular dynamics.

    2. Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging and wide field optical imaging, they observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses that can be easily addressed:

      Comments on revisions:

      As I mentioned in my initial review, this study is important. In my opinion, it could be published as is. Nonetheless, I am still somewhat dissatisfied with the authors' responses to my earlier comments. I understand that the same animals were not used for both stimulation paradigms, which is unfortunate. Nonetheless, I would have appreciated it if the authors had provided a couple of experiments illustrating GCaMP7 signals during brief stimulation in their reply to the reviewers. I am still unconvinced by the authors' suggestion that the GCaMP7 signal would remain stable during removal of the vascular undershoot. Since the absence of the undershoot is notable, I anticipate that a significant part of the initial response to prolonged stimulation is influenced by processes that occur during the 0.1-second stimulation, processes that may involve a change in the bulk neuronal response.

      In short, the data could support or refute the following statement: "Loss of type-I nNOS neurons drove minimal changes in the vasodilation elicited by brief stimulation..."

    1. If your future boss asks you for some creative thinking off the top of your head, you’d look incompetent if you had to first ask your AI app—your boss would wonder why they hired you than some other random, less expensive, interchangeable person who also can operate an AI app.

      I agree that employers care a lot about creativity and being able to think quickly on your feet, but I also think there’s value in knowing how to use AI as a tool. For example, when I was on a supply call, the teacher showed me a lesson plan she had quickly generated with ChatGPT on her phone. As an elementary teacher with so many classes to prepare for, I could see how it saved her time. But I think the best use of AI is to spark ideas, putting your own creativity into the prompt and then tweaking the output for your students, rather than just taking the first thing it gives you.

    1. eLife Assessment

      Whole-brain imaging of neuronal activity in freely behaving animals holds great promise for neuroscience, but numerous technical challenges limit its use. In this important study, the authors describe a new set of deep learning-based tools to track and identify the activity of head neurons in freely moving nematodes (C. elegans) and jellyfish (Clytia hemisphaerica). While the tools convincingly enable high tracking speed and accuracy in the settings in which the authors have evaluated them, the claim that these tools should be easily generalizable to a wide variety of datasets is incompletely supported.

    2. Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

    3. Author response:

      Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      We appreciate the reviewer’s point that expanding to additional animal models would be valuable. In the study, we have so far tested our approaches on C. elegans and Jellyfish. Given that this is considered a ‘very, very minor weakness’ and that it does not directly affect the results or analyses in the paper, we think this might be better to address in future work.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      We agree that it would be valuable to benchmark other labs’ software pipelines on our datasets. We note that most papers in this area, which describe those pipelines, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ on our data might not represent those pipelines in their best light when compared to our pipeline that was developed with our data in mind. Data from different microscopy platforms can be surprisingly different and we wouldn’t want to perform an analysis that had this bias. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

      Indeed, some pre-processing was performed on images before registration and neuron identification -- understanding these nuances can be important. The pre-processing steps are described in the Results section and detailed in the Methods. They are also all available in our open-source software. For BrainAlignNet, the key steps were: (1) selecting image registration problems, (2) cropping, and (3) Euler alignment. Steps (1) and (3) were critically important and are extensively discussed in the Results and Discussion sections of our study (lines 142-144, 218-234, 318-323, 704-712). Step (2) is standard in image processing. For AutoCellLabeler and CellDiscoveryNet, the pre-processing was primarily to align the 4 NeuroPAL color channels to each other (i.e. make sure the blue/red/orange/etc channels for an animal are perfectly aligned). This is also just a standard image processing step to ensure channel alignment. Thus, the more “custom” pre-processing steps were extensively discussed in the study and the more “common” steps are still described in the Methods. The implementation of all steps is available in our open-source software.

      Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      We agree that it would be valuable to benchmark many labs’ software pipelines on some common datasets, ideally from several different research labs. We note that most papers in this area, which describe the other pipelines that have been developed, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ and comparing the results to our pipeline (where we have extensive expertise) might bias the performance metrics in favor of our software. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      We appreciate that the machine learning field moves fast. Our goal was not to invent entirely novel machine learning tools, but rather to apply and optimize tools for a set of challenging, unsolved biological problems. We began with the somewhat simpler architectures described in our study and were largely satisfied with their performance. It is conceivable that newer approaches would perhaps lead to even greater accuracy, flexibility, and/or speed. But, oftentimes, simple or classical solutions can adequately resolve specific challenges in biological image processing.

      Regarding CellDiscoveryNet, our claim of unsupervised training is precise: CellDiscoveryNet is trained end-to-end only on raw images, with no human annotations, pseudo-labels, external classifiers, or metadata used for training, model selection, or early stopping. The loss is defined entirely from the input data (no label signal). By standard usage in machine learning, this constitutes unsupervised (often termed “self-supervised”) representation learning. Downstream clustering is likewise unsupervised, consuming only image pairs registered by CellDiscoveryNet and neuron segmentations produced by our previously-trained SegmentationNet (which provides no label information).

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Regarding BrainAlignNet: we agree that we trained on each species’ own data (worm, jellyfish) and we would suggest other labs working on new organisms to do the same based on our current state of knowledge. It would be fantastic if there was an alignment approach that generalized to all possible cases of non-rigid-registration in all animals – an important area for future study. We also agree that pre-alignment was critical in worms and jellyfish, which we discuss extensively in our study (lines 142-144, 318-321, 704-712).

      Regarding AutoCellLabeler: the animals were not recorded in any standardized pose and were not aligned to each other beforehand – they were basically in a haphazard mix of poses and we used image augmentation to allow the network to generalize to other poses, as described in our study. It is still possible that AutoCellLabeler is somehow brittle to pose changes (e.g. perhaps extremely curved worms) – while we did not detect this in our analyses, we did not systematically evaluate performance across all possible poses. However, we do note that this network was able to label images taken from freely-moving worms, which by definition exhibit many poses (Figure 5D, lines 500-525); aggregating the network’s performance across freely-moving data points allowed it to nearly match its performance on high-SNR immobilized data. This suggests a degree of robustness of the AutoCellLabeler network to pose changes.

      Regarding ANTSUN 2.0: we agree that there are some hyperparameters (described in our study) that affect ANTSUN performance. We agree that it would be worthwhile to fully automate setting these in future iterations of the software.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Please see our response to your point (1) under Weaknesses above.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      There are other efforts in the literature to solve the neuron tracking and neuron identification problems in C. elegans (please see paragraphs 4 and 5 of our Introduction, which are devoted to describing these). However, they are quite different in the approaches that they use, compared to our study. For example, for neuron tracking they use t->t+1 methods, or model neurons as point clouds, etc (a variety of approaches have been tried). For neuron identification, they work on extracted features from images, or use statistical approaches rather than deep neural networks, etc (a variety of approaches have been tried). Our assessment is that each of these diverse approaches has strengths and drawbacks; we agree that a meta-analysis of the design choices used across studies could be valuable.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      We agree that there are some ANTSUN 2.0 hyperparameters (described in our Methods section) that could affect the quality of neuron tracking. It would be worthwhile to fully automate setting these in future iterations of the software, ensuring that the hyperparameter settings are robust to variation in data/experiments.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      We wish to remind the reviewer that we developed BrainAlignNet for use in worms and jellyfish. These two animals have different distributions of neurons and radically different anatomy and movement patterns. Data from the two organisms was collected in different labs (Flavell lab, Weissbourd lab) on different types of microscopes (spinning disk, epifluorescence). We believe that this is a good initial demonstration that the approach has robustness across different settings.

      Regarding comparisons to other labs’ C. elegans data processing pipelines, we agree that it will be extremely valuable to compare performance on common datasets, ideally collected in multiple different research labs. But we believe this should be performed collaboratively so that all software can be utilized in their best light with input from each lab, as described above. We agree that such a comparison would be very valuable.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

      Regarding worms vs jellyfish pre-processing: we actually had the exact opposite reaction to that of the reviewer. We were surprised at how similar the pre-processing was for these two very different organisms. In both cases, it was essential to (1) select appropriate registration problems to be solved; and (2) perform initialization with Euler alignment. Provided that these two challenges were solved, BrainAlignNet mostly took care of the rest. This suggests a clear path for researchers who wish to use this approach in another animal. Nevertheless, we also agree with the reviewer’s caution that a totally different use case could require some re-thinking or re-strategizing. For example, the strategy of how to select good registration problems could depend on the form of the animal’s movement.

      Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      We performed dataset-specific training for image registration and neuron identification, and we would encourage new users to do the same based on our current state of knowledge. This highlights how standardization of whole-brain imaging data across labs is an important issue for our field to address and that, without it, variations in imaging conditions could impact software utility. We refer the reviewer to an excellent study by Sprague et al. (2025) on this topic, which is cited in our study.

      However, at the same time, we wish to note that it was actually reasonably straightforward to take the BrainAlignNet approach that we initially developed in C. elegans and apply it to jellyfish. Some of the key lessons that we learned in C. elegans generalized: in both cases, it was critical to select the right registration problems to solve and to preprocess with Euler registration for good initialization. Provided that those problems were solved, BrainAlignNet could be applied to obtain high-quality registration and trace extraction. Thus, our study provides clear suggestions on how to use these tools across multiple contexts.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      We respectfully disagree with this critique. We considered the alternative suggested by the reviewer (in their private comments to the authors) of comparing against a manually annotated dataset. But this annotation would require manually linking ~150 neurons across ~1600 timepoints, which would require humans to manually link neurons across timepoints >200,000 times for a single dataset. These datasets consist of densely packed neurons rapidly deforming over time in all 3 dimensions. Moreover, a single error in linking would propagate across timepoints, so the error tolerance of such annotation would be extremely low. Any such manually labeled dataset would be fraught with errors and should not be trusted. Instead, our approach relies on a simple, accurate assumption: GFP expression in a neuron should be roughly constant over a 16min recording (after bleach correction) and the levels will be different in different neurons when it is sparsely expressed. Because all image alignment is done in the red channel, the pipeline never “peeks” at the GFP until it is finished with neuron alignment and tracking. The eat-4 promoter was chosen for GFP expression because (a) the nuclei labeled by it are scattered across the neuropil in a roughly salt-and-pepper fashion – a mixture of eat-4-positive and eat-4-negative neurons are found throughout the head; and (b) it is in roughly 40% of the neurons, giving very good overall coverage. Our view is that this approach of labeling subsets of neurons with GFP should become the standard in the field for assessing tracking accuracy – it has a simple, accurate premise; is not susceptible to human labeling error; is straightforward to implement; and, since it does not require manual labeling, is easy to scale to multiple datasets. We do note that it could be further strengthened by using multiple strains each with different ‘salt-and-pepper’ GFP expression patterns.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      Our tracking accuracy requires (a) a careful selection of registration problems, (b) highly accurate registration of the selected registration problems, and (c) effective clustering. We extensively discussed the importance of the choosing of the registration problems in the Results section (lines 218-234 and 318-321), Discussion section (lines 704-708), and Methods section (955-970 and 1246-1250) of our paper. We also discussed the clustering aspect in the Results section (lines 247-259), Discussion section (lines 708-712), and Methods section (lines 1162-1206). In addition, our abstract states that the BrainAlignNet needs to be “incorporated into an image analysis pipeline,” to inform readers that other aspects of image analysis need to occur (beyond BrainAlignNet) to perform tracking.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

      The reviewer raises two points here: (1) whether AutoCellLabeler accuracy is impacted by ease of human labeling; and (2) what fraction of total neurons are identified. We address them one at a time.

      Regarding (1), we believe that the reviewer overlooked an important analysis in our study. Indeed, to assess its performance, one can only compare AutoCellLabeler’s output against accurate human labels – there is simply no way around it. However, we noted that AutoCellLabeler was identifying some neurons with high confidence even when humans had low confidence or had not even tried to label the neurons (Fig. 4F). To test whether these were in fact accurate labels, we asked additional human labelers to spend extra time trying to label a random subset of these neurons (they were of course blinded to the AutoCellLabeler label). We then assessed the accuracy of AutoCellLabeler against these new human labels and found that they were highly accurate (Fig. 4H). This suggests that AutoCellLabeler has strong performance even when some human labelers find it challenging to label a neuron. However, we agree that we have not yet been able to quantify AutoCellLabeler performance on the small set of neuron classes that humans are unable to identify across datasets.

      Regarding (2), we agree that knowing how many neurons are labeled by AutoCellLabeler is critical. For example, labeling only 3 neurons per animal with 100% accuracy isn’t very helpful. We wish to emphasize that we did not omit this information: we reported the number of neurons labeled for every network that we characterized in the study, alongside the accuracy of those labels (please see Figures 4I, 5A, and 6G; Figure 4I also shows the number of human labels per dataset, which the reviewer requested). We also showed curves depicting the tradeoff between accuracy and number of neurons labeled, which fully captures how we balanced accuracy and number of neurons labeled (Figures 5D and S4A). It sounds like the reviewer also wanted to know the total number of recorded neurons. The typical number of recorded neurons per dataset can also be found in the paper in Fig. 2E.

  5. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. Thankfully, poor children may have access to the federally funded Head Start program, but children of the wealthy have a dif~erent kind of head start.

      This passage poignantly reveals the hidden "starting point inequality" within the American education system. On the surface, the federal government, through the Head Start program, provides preschool support for children from poor families, seemingly bridging the gap. But in reality, children from wealthy families enjoy a completely different starting advantage—from expensive private preschools, premium housing in school districts, private tutoring, to a rich array of extracurricular activities—these hidden resources constitute a parallel education system. As the author ironically contrasts that the "head start" for poor children is government relief programs, while the "head start" for wealthy children is a legacy of class privilege passed down from generation to generation. This fundamental resource gap renders the concept of "equal educational opportunity" ineffective at the very beginning and explains the continued decline in social mobility in the United States that while poor children are still learning to recognize letters, their wealthy counterparts are already learning programming and debate.

    1. with charm braceletsjingling on their thin wrists; they would lean together to whisper and laugh secretly ifsomeone passed who amused or interested them. Connie had long dark blond hair thatdrew anyone's eye to it, and she wore part of it pulled up on her head and puffed out andthe rest of it she let fall down her back. She wore a pull-over jersey blouse that looked oneway when she was at home and another way when she was away from home

      This sentenceis interesting because the author writes out the scene which Connie is free and happy with her friends.

  6. Sep 2025
    1. I call thisa challenge because producing eloquently written work with the goal toinform/entertain/persuade the reader is not an easy task.

      It can be hard to write exactly how I'm feeling in a clear way. Thoughts that make sense in my head don't come out the same way on paper.

    1. Author response:

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript will be much stronger once we incorporate the requested changes.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs have to associate with the olfactory receptor co-receptor (Orco) in the cilium of the neuron to form functional OR-Orco complexes for odorant detection. Besides this chaperone function, Orco can form homomers with the potential to act as ionic pacemaker channels; a role which we explore in this study.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2016). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco Ligand Candidates (OLC) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). In that study, we could also demonstrate that OLC15 antagonizes the VUAA1 activation of Orco.

      Furthermore, we tested other published Orco antagonists in in vivo assays in intact hawkmoths, focusing on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific but instead affected different targets depending on time-of-day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Based on comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15.

      We will clarify the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We will include these additional qPCR experiments and edit the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints. We are currently working on the transcriptional control of Orco, both during ontogeny and throughout the day but this work in progress is beyond the scope of this manuscript.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). We will add the 2015 citation to the Modeling chapter in the Methods section to clarify this.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs. Thus, as the referee suggests, we will add text regarding the presence and localization of OR-Orco heteromers. However, we have indications that Orco homomers could indeed be present in the hawkmoth ORNs. In a heterologous expression system, MsexOrco expression alone was sufficient to increase intracellular Ca<sup>2+</sup> levels in response to VUAA1 application (Nolte et al., 2013). In differentiating primary cell cultures of hawkmoth antennae, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors, and Orco affected spontaneous activity (Nolte et al., 2016). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but cannot heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990).

      We will clarify our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during a very challenging long-term recording experiment over several days. In addition, we observed in our animal raising facility that in LD 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Here, we used isolated males that were never exposed to the female pheromones so that their circadian activity patterns readily disperse. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a free-running population. As requested by the referees in point (7), we will use additional tests for rhythmicity in each of our recordings and revise the manuscript accordingly.

      Assuming that hawkmoths need pheromone presence as additional Zeitgeber, we are currently working on a new set of experiments where we attempt to improve synchronization by exposure to LD cycles and pheromone before DD and OLC15 recordings. We will add these experiments to the manuscript.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording site is located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We will make this more clear in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs. This would indicate that all ORNs, whether they express pheromone- or general odorant receptors, could potentially share the same Orco-dependent spontaneous activity rhythms. In our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum.

      (5.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that these PKC and cGMP/cAMP-dependent regulations are present in other insect species. We are currently running thorough tip-recording experiments on the regulation of Orco gating, which are beyond the scope of this manuscript. However, we will add a set of experiments to this manuscript that demonstrates cAMP gating of Orco.

      (5.2)… and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper (Stengl and Schneider, 2024).

      (5.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro ((Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (reviews: Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)).

      (5.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a PKC- and cAMP-dependent modulation of Orco. These studies will be published in a follow-up publication.

      (6) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=34). Since 5/11 LD recordings and 7/10 DD recordings revealed daily/circadian rhythmicity and since many other physiological recordings at different ZTs of different members of our laboratory all revealed ZT-dependent pheromone-transduction we can be certain that the physiology of hawkmoth antennae is under strict circadian control. Please see also our response to (4) above commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      Nevertheless, we will follow the advice of the referees to apply additional tests for significance of rhythms in spontaneous activity, and we are thankful for the tests suggested that we were not aware of.

      (7) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      We will revise our data analysis, according to the valuable suggestions of the referees.

      However, based upon our previous studies with other Orco antagonists and different doses of OLC15 (Nolte et al., 2016) we found that 50 µM OLC15 is the best Orco antagonist dose in M. sexta to target Orco-dependent modulation of spontaneous action potential activity of hawkmoth olfactory receptor neurons. Please see also our response to (1).

      (8) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We will revise the discussion accordingly and clarify which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (9.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We currently search for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single nuclear transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript.

      (9.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. We will revise our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrate that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We will revise the discussion accordingly.

      b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We will add those experiments to the revised version of the manuscript (see our response to (2)).

      c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We will revise the manuscript accordingly.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We will revise the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      We will clarify the Methods section.

      References

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. doi:10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. J Exp Biol 206:1575–1588. doi:10.1242/jeb.00302

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Front Cell Neurosci 12:218. doi:10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. doi:10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Curr Biol 34:1414-1425.e5. doi:10.1016/j.cub.2024.02.042

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. doi:10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proc Natl Acad Sci 108:8821–8825. doi:10.1073/pnas.1102425108

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. J Exp Biol 172:345–354. doi:10.1242/jeb.172.1.345

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. doi:10.1038/22566

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. J Biol Rhythms 22:502–514. doi:10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. doi:10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. doi:10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. J Biol Rhythms 22:43–57. doi:10.1177/0748730406295462

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. J Biol Rhythms 29:318–331. doi:10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. J Biol Rhythms 27:388–397. doi:10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. doi:10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. doi:10.1523/ENEURO.0376-24.2024

      Stengl M. 2010. Pheromone Transduction in Moths. Front Cell Neurosci 4:133. doi:10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. J Comp Physiol A 174:187–194. doi:10.1007/BF00193785

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. J Comp Physiol A 199:897–909. doi:10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. J Neurosci 10:837–847. doi:10.1523/JNEUROSCI.10-03-00837.1990

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Front Physiol 14:1243455. doi:10.3389/fphys.2023.1243455

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Curr Biol 14:638–649. doi:10.1016/j.cub.2004.04.009

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla In: Locke M, Smith DS, editors. Insect Biology in the Future. Academic Press. pp. 735–763. doi:10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell Tissue Res 383:7–19. doi:10.1007/s00441-020-03363-x

    1. interest in political organizing might be reawakened. She might reach out to other politically-minded friends or join a political

      I can see how some people who come from a place of privilege in society might not enjoy philosophy if it awakens you to social injustice. If the problems don't affect you, it is easiest and probably less stressful to stick your head in the sand.

    2. but it can alienate some students

      I like the idea that putting this profound method in kids head would alienate at least a couple since if they weren't going into something like math or becoming a doctor or lawyer then why would they care about anything other than work, so I like how they switch the perspective on how to teach it without being more of focus on intelligence to just accept ourselves for what we are, so there's no thought into I'm not good enough for this.

    3. Given that few of my students will ultimately find their way into the academy and that, within that already small cohort, only a fraction will choose to do so in the field of philosophy, the question of why study philosophy has a particular resonance for them, and for me as their teacher. One answer to this question is pragmatic – philosophy teaches you to think and write logically and clearly. This, we tell our students, will be of use to them no matter what path they pursue.

      As someone who is taking a philosophy class for the first time in my academic career, this section of the reading stood out to me because I agree that it is very useful and important to able to understand things logically in your own head, and it's another skill to be able to communicate those thoughts out loud. Even if you don't end up pursuing a career in philosophy, taking a philosophy class can still teach you how to draw connections and share ideas with others. This is something that can absolutely be applied to any class, job, or even in personal relationships I imagine. This made me feel excited about future reading material to come!

    4. Why, these students might ask, is the knowledge that philosophy aims at any deeper than that of more practical fields such as medicine, science, or the law?

      I think philosophy is always asking yourself "why?" Practical fields like medicine and law require you to have a certain amount of knowledge to be deemed qualified, and typically it's knowledge that can either true of false, or factual. However , philosophy is very different in that it requires great critical thinking and the ability to clearly explain those thoughts to be deemed great and skillful. You don't necessarily need to have certain terminology and facts remembered off the top of your head.

    5. Now, ask yourself: what could philosophy do for you?

      Opening the essay with a hypothetical is a powerful way to incite an image into the reader's head, allowing them to really ponder about the situation that the writer wishes to emphasize. However, this last ending question feels out of place, the concept of philosophy feels completely unrelated to what was being explained earlier. In what moment of this person's daily routine would they be able to think about the positive aspects of philosophy? It is unknown whether this was the intention, but I believe the article should have started with a hypothetical more relevant to their topic of discussion: philosophy.

    1. Use body language (such as giving eye contact, leaning forward, and nodding) to indicate their engagement in the conversation e Pause to paraphrase, ask questions, and summarize the conversation in order to avoid miscommunication e Resist judging the comments that a beginning teacher makes ° Respond in a way that communicates respect and appreciation for what the beginning teacher shares (such as “I hear what you’re saying,” “It sounds like you really feel frus- trated,” or “Thank you for sharing that. How can I help?”) In addition to using active listening during conversations, mentors should pay attention to the non- verbal cues a beginning teacher uses. Look for signs of fatigue (such as slow movements or difficulty concentrating), frustration (such as eye-rolling or crossed arms), or despair (such as puffy eyes or other indicators of crying). By paying attention to both verbal and nonverbal communications, a mentor can see indications of distress before they come to a head and show the beginning teacher that he or she cares. val “\o / Ty \S Yo Conduct Daily Check-Ins rar rene? >» Daily check-ins are short conversations between mentors and mentees about how a mentec is feel- ing and performing, both inside and outside the classroom. Mentors can send emails and text messages to mentees or call them on the phone, even outside school hours. Do not feel obligated to make these check-ins formal or extensive; even a simple “How’s it going?” followed by active listening can make a world of difference. Staying in communication with mentees helps them feel supported but also helps a mentor notice when something is amiss. This easy strategy can facilitate the growth of the mentor- mentee relationship throughout the school year. Validate the Teacher's Feelings Once it becomes clear how a mentee feels, provide reassurance that his or her feelings are normal and will not last forever. Relate the mentee’s experience to the different phases of first-year teaching (Moir, 1999; see figure 1.2 on page 9) to validate his or her feelings and show that many beginning teachers feel the same way. Giving ; new teachers a chance to relate to these j phases can help them feel a Providing Emotional Support 4 por sense of normalcy regarding their emotions and concerns. Some also feel a sense of relief that they are not alone in their journey, particularly during the survival and disillusionment phases. Be sure to point out that teachers do not stay in these phases forever and that the job becomes easier and easier with each passing year. Additionally, share personal reflections and anecdotes from your own first years as a teacher to help the mentee feel a sense of camaraderie. Use the essays and reflection questions in appendix B (page 79), which provide a window into the life of a beginning teacher, or reflect individually on the first- year teaching phases (see figure 1.2 on page 9) to recall the unique challenges and emotions that a new teacher faces. Consider difficult experiences from recent years, as well, and describe the different chal- lenges and rewards that each year brings. Alternatively, collect and share stories from other colleagues in the school building. Point out that even the most seasoned teachers began as novices. These shared | experiences can stimulate a comfortable and reflective dialogue between a mentor and a mentee. Send Encouraging Messages Periodically send positive notes, emails, and text messages to beginning teachers to remind them of your availability and support. Include positive, behavior-specific feedback in letters to mentees to keep their spirits high and to encourage them to press on. For example, write something such as, “I noticed that instead of correcting Jerrod in front of the class today, you spoke privately with him about his behavior—that was very effective!” Sy, support for beginning teachers. Robert J. Marzano and Debra J. Pickering (2011) pointed out that inspirational quotations that demonstrate examples of self-efficacy can be encouraging. As Dale H. Schunk and Frank Pajares (2009) explained, self-efficacy “refers to the perceived capabilities for learn- | ing or performing actions at designated levels” (p. 35). In other words, teachers who have a strong sense of self-efficacy believe that they can execute their duties successfully or learn to execute them successfully. Because a beginning teacher may also be struggling to cultivate self-efficacy, inspirational quotations can serve as powerful reminders of the importance of persevering, striving for goals, and staying optimistic. When providing examples of motivating quotations, mentors can refer to this list of selected BrainyMedia (2014) quotations, as cited in Marzano and Pickering (2011), involving three categories: (1) perseverance, (2) greatness and following hopes and dreams, and (3) optimism. oN Choosing cards that contain reflective quotes or heartening messages can also provide sae oY Perseverance e “Genius is eternal patience.” —Michelangelo e “Without struggle, there can be no progress.” —Frederick Douglass e “Tn the middle of difficulty lies opportunity.” —Albert Einstein e “Don’t fear mistakes, there are none.” —Miles Davis e “T’ve got to keep breathing. It’ll be my worst business mistake if I don’t.” —Steve Martin ¢ “Tf you’re going through hell, keep going.” —Winston Churchill e “Tt’s not whether you get knocked down; it’s whether you get up.” —Vince Lombardi ss) Me, er

      Using positive body language is so important. This is an area I want to grow in, and also ask for feedback as I. might not be aware of how I'm coming across.

  7. learn-us-east-1-prod-fleet01-beaker-xythos.content.blackboardcdn.com learn-us-east-1-prod-fleet01-beaker-xythos.content.blackboardcdn.com
    1. The debt contract puts a gun to the head of managers. Theymust repay the debt on schedule, or they will lose control of

      Does this kind of pressure lead to bad decision making that can lead to a negative out put?

    1. In the United States, for example, if we nod our head up and down, we mean yes, and if we shake it back and forth, we mean no. In Bulgaria, however, nodding means no, while shaking our head back and forth means yes!

      this is interesting

    1. She thinks, What if you hid a man in there?

      During her day to day life she sees things that intrigue her and she gets stories from them and they build in her head. When the end says "We save our lives in such unlikely ways" they are talking about how she uses creativity and her daily life to make stories just to keep her head.

    1. I remember Those are pearls that were his eyes.

      When I read the line “Those are pearls that were his eyes” in tonight’s reading, I was shocked. I was immediately taken back to our conversation about Ariel’s character in The Tempest, and the identical line in “The Burial of the Dead,” Furthermore, the difference between the uses of “Those are pearls that were his eyes,” interested me a lot. In “The Burial of the Dead” the line is a parenthetical line in reference to the “drowned Phonecian Sailor” in the tarot card reading by “Madame Sosostris.” Furthermore the full line reads “Those are pearls that were his eyes. Look!” This is a direct reference to when Ariel sings to Ferdinand, whom they have shipwrecked on Prospero’s island. They sing: “Full fathom five thy father lies; / Of his bones are coral made; / Those are pearls that were his eyes: / Nothing of him that doth fade / But doth suffer a sea-change / Into something rich and strange,” lying to Ferdinand that his father died in the shipwreck, leaving him, as the heir to the throne, the new King. This line, full of rhymes and interesting imagery, acts as a spell, with Ariel using his magical rhetoric to convince Ferdinand of an untruth. Thus, the line’s use in “The Burial of the Dead” can be seen as a diversion, with the “Look!” making that all the more convincing. In “A Game of Chess,” the line reads “I remember/Those are pearls that were his eyes.” To which the (a?) speaker responds, “‘Are you alive, or not? Is there nothing in your head?’” The reintroduction of this line in this context, perhaps suggests the memory of humanity’s connection to nature, a time when beautiful and rare pearls equated to the beauty and importance of one’s vision. The response, “‘Are you alive, or not?’” confused me, especially after the preceding line, but maybe that will be parsed through during our discussion, or in a further source. Additionally, the placement of this line, in response to the questions “Do/you know nothing? Do you see nothing? Do you remember/’Nothing?’” felt really intentional to me. It seemed as if it was testing the reader’s memory; in some way, this quoted speaker is Eliot, reaching out to the audience, asking us if we remember the first use of the line.

    2. Here is the man with three staves

      The Tarot section of The Waste Land seems at first a random assortment of cards and ideas mashed together in a list. Of these ideas, the “man with three staves” struck me, especially because of Eliot’s description of him in the footnote. Eliot writes “The Man with Three Staves I associate, quite arbitrarily, with the Fisher King himself.” The tarot cards, including this one, are associated with Arthurian legends in a less “arbitrary” fashion by Weston, who argues that four objects related with Arthurian legends “exist today as the four suits of the Tarot,” thereby establishing a direct connection between the ancient legend and modern day Tarot cards.

      The card of the men with three staves is of particular interest beyond simply Eliot mentioning it, primarily because of the religious imagery present in its composition. The card consists of a man with three sticks, two on one side of him and one on the other. These three wooden sticks can be seen as representations of the holy trinity of God the father, the Son, and the Holy Spirit. Further religious imagery comes from the outfit of the man. He is dressed in a red cape, with an armored arm poking out, and a thin golden circle resting on his head. This is reminiscent of the signature look of a Roman soldier during the time of Jesus’ death, with the crown potentially symbolizing Jesus’ holiness or status as “King of the Jews.”

      The religious composition of this card connects Christian themes to the otherwise non-Christian practice of spiritual divination through Tarot readings. This contradiction, which is also apparent in other Tarot cards, is similar to the Christian/pagan contradiction within the Arthurian legends. The legends, as Weston argues, pull from both folk and Christian stories, and both stories must be understood together to understand the true background of the final legend. Similarly, the listing of many different Tarot figures in The Waste Land emphasize contradiction, and are meant to loose the reader in a mess, both on the surface and underneath, that can only be understood when multiple conflicting parts are held together.

    3. 'They called me the hyacinth girl.'

      The use of these lines from Wagner's "Tristan und Isolde" - where the sailor misses and yearns for his Irish girl - sets a tone of grief and longing for the poem. Eliot's use of the Hyacinthia festival from ancient Sparta is interesting here. Hyacinthus was a beautiful youth beloved by Apollo who was accidentally killed when Apollo's discus struck him in the head, and Apollo then transformed him into the hyacinth flower - so Hyacinthus represents grief and transformation through death. In The Wasteland it seems the hyacinth girl almost has turned into the flower as she isn't dead but can't speak, trapped between life and death like the mythological transformation. The Hyacinthia girl describes someone else with "your arms full" of hyacinths - so this person is actually holding onto grief itself, the physical embodiment of death and loss. The hyacinth girl becomes speechless from being overwhelmed. The hyacinths being picked creates an important image because to have picked them is to have killed them, cutting them from their life source. This leads to the "heart of light," usually a moment of inner peace and spiritual illumination, being silenced by overwhelming sorrow. This progression of loss is then followed by "Oed' und leer das Meer" from Wagner's opera—meaning "desolate and empty the sea." This reinforces the theme of grief and emptiness from loss; the barren landscape reflects internal emptiness. Then, following Elliot’s theme of longing and grief, the reader is answered by Madame Sosostris—who, as Huxley showed us in "Crome Yellow" with his character Sesostris, isn't even a real prophet but a fraud, offering only fake spiritual comfort to the overwhelimg grief.

    1. When he was in close proximity to the chief of the village, he would puthis pistol to the chief’s head, demanding a ransom of food in exchange for thechief’s release.

      What a tweaker. I keep having this same question: was this really the most optimal strategy? Since we know that the Powhatan wanted to include them then certainly not.

    Annotators

    1. Author response:

      General Statements

      In this paper we demonstrate that the lipid packing of the plasma membrane has a huge impact on the stability of caveolae. By using interdisciplinary techniques, we show that the widely used dynamin inhibitor Dyngo-4a adsorbs and inserts to lipid bilayers leading to a decreased lipid packing and hence reduced caveolae dynamics and internalization even in cells lacking dynamin. We have added experiments that validates that Dyngo-4a treatment does not result in fragmentation or disassembly of the caveolae.  A FRAP assay of cytosolic caveolae has been employed to address questions concerning scission. Moreover, as suggested by the reviewers, we have also included new simulation data that show and expand on the fact that Dyngo-4a positions in the lipid leaflet similar to cholesterol and preferentially associates with cholesterol clusters, affecting the spatial distribution of cholesterol in the membrane. We believe that these added data have greatly improved the paper and strengthened our conclusions that the lipid packing is a critical determinant in the balance between internalization and stable plasma membrane association of membrane vesicles.

      As requested, we have expanded the introduction to provide more detailed information about previous findings in the field. Changes and addition to the text has been highlighted in red for easier tracking.

      Point-by-point description of the revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      The authors use Dyngo-4a, a known Dynami inhibitor to test its influence on caveolar assembly and surface mobility. They investigate, whether it incorporates into membranes with Quartz-Crystal Microbalance, they investigate how it is organized in membranes using simulations. Finally, they use lipid-packing sensitive dyes to investigate lipid packing in the presence of Dyngo-4a, membrane stiffness using AFM and membrane undulation using fluorescence microscopy. They also use a measure they call "caveola duration time" to claim that something happens to caveolae after Dyngo-4a addition and using this parameter, they do indeed see an increase in it in response to Dyngo-4a, which is reduced back to the baseline after addition of cholesterol.

      Overall, the authors claim: 1) Dyngo-4a inserts into the membrane and this 2) results in "a dramatic dynamin-independent inhibition of caveola scission". 3) Dyngo-4a was inserted and positioned at the level of cholesterol in the bilayer and 4) Dyngo-4a-treatment resulted in decreased lipid packing in the outer leaflet of the plasma membrane 5) but Dyngo-4a did not affect caveola morphology, caveolae-associated proteins, or the overall membrane stiffness 6) acute addition of cholesterol counteracts the block in caveola scission caused by Dyngo-4a.

      Overall, in this reviewers opinion, claims 1, 3, 4, 5 are well-supported by the presented data from electron and live cell microscopy, QCM-D and AFM.

      However, there is no convincing assay for caveolar endocytosis presented besides the "caveola duration" which although unclearly described seems to be the time it takes in imaging until a caveolae is not picked up by the tracking software anymore in TIRF microscopy.

      Since the main claim of the paper is a mechanism of caveolar endocytosis being blocked by Dyngo-4a, a true caveolar internalization assays is required to make this claim. This means either the intracellular detection of not surface connected caveolar cargo or the quantification of caveolar movement from TIRF into epifluorescence detection in the fluorescence microscope. Otherwise, the authors could remove the claim and just claim that caveolar mobility is influenced.

      We thank the reviewer for the nice constructive comments, and we very much appreciate the positive critique. We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. In addition, we are currently preforming CTxB HRP experiments to quantify the number of caveolae at PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Reviewer #1 (Significance):

      A number of small molecule inhibitors for the GTPase dynamics exist, that are commonly used tools in the investigation of endocytosis. This goes as far that the use of some of these inhibitors alone is considered in some publications as sufficient to declare a process to be dynamin-dependent. However, this is not correct, as there are considerable off-target effects, including the inhibition of caveolar internalization by a dynamin-independent mechanism. This is important, as for example the influence of dynamin small molecule inhibitors on chemotherapy resistance is currently investigated (see for example Tremblay et al., Nature Communications, 2020).

      The investigation of the true effect of small molecules discovered as and used as specific inhibitors and their offside effects is extremely important and this reviewer applauds the effort. It is important that inhibitors are not used alone, but other means of targeting a mechanism are exploited as well in functional studies. The audience here thus is besides membrane biophysicists interested in the immediate effect of the small molecule Dyngo-4a also cell biologists and everyone using dynamic inhibitors to investigate cellular function.

      Reviewer #2 (Evidence, reproducibility and clarity):

      This manuscript uses the small molecule dynamin inhibitors dynasore and dyngo to show that in dynamin triple knockout cells that these inhibitors impact lipid packing and organization in the plasma membrane. Data showing that dyngo affects caveolin dynamics using tirf microscopy is also shown and is interpreted to reflect inhibition of caveolae scission from the membrane.

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion that dyngo is preventing caveolae scission from the membrane. Study of caveolae endocytosis is based on a TIRF assay that has inherent limitations:

      - Caveolae are defined as bright cav1-positive spots in diffraction limited TIRF and their disappearance presumed to be endocytic events. Cav1 spots are presumed to be caveolae but the authors do not consider that they may be flat non-caveolar oligomers. The diffraction limited TIRF approach interprets the large structures as caveolae but evidence to that effect is lacking.

      This is a valid comment and to address this we have now included data showing colocalization of cavin1 and EHD2 to the Cav1-GFP spots. We can however not determine if they are flat or invaginated. We do have extensive experience imaging caveolae using TIRF microscopy and carefully chose cells that display low expression of fluorescently labelled caveolin to avoid non-caveolar structures.

      - The analysis (and the diagram presented in figure 4) considers that caveolae can either diffuse laterally in the membrane or internalize and does not consider that caveolae can flatten and possibly fragment in the membrane. Is it not possible that loss of Cav1 spots is a fragmentation event and not necessarily a scission event?

      This is a good question, yet, fragmentation and disassembly would result in shorter track durations and this is not what is observed in data. We have now also included data showing that cavin1 is persistently associated with the Cav1 spots identified as caveolae during Dyngo-4a treatment indicating that these are caveolae. Furthermore, IF stainings showing colocalization of Cav1GFP with cavin1 or EHD2 after Dyngo-4a treatment have also been added. We have now also expanded on the different interpretations of the data in the results section.

      - The analysis is based on overexpression of Cav1-GFP that may alter the stoichiometry between Cav1 and cavin1 such that while caveolae may be expressed, larger non-caveolar structures may accumulate.

      Yes, this is correct, we have specifically imaged cell expressing low levels of Cav1-GFP to avoid accumulated non-caveolar structures that can be spotted in cells with high expression.

      - Cav1 has been shown to be internalized via the CLIC pathway (Chaudary et al, 2014) and if dyngo is impacting clathrin then maybe it is also impacting CLIC endocytosis and thereby Cav1 endocytosis via this pathway?

      Dyngo-4a has been shown to not affect CLIC endocytosis (McCluskey et al., 2013) and in our data we do not see internalization following Dyngo-4a treatment.

      - The longer Cav1 TIRF track time and shorter displacement with dyngo is consistent with inhibition of caveolae scission. However, as the authors discuss, could not reduced membrane undulations due to dyngo's impact on membrane order be responsible for the longer tracks? Alternatively, perhaps the altered lipid packing is corralling Cav1 movement and reducing non-caveolar Cav1 endocytosis, resulting in shorter tracks of longer duration? The proposed interaction of dyngo with cholesterol could prevent scission but also stabilize large (flat?) Cav1 oligomers in the membrane, perhaps reducing Cav1 oligomer fragmentation.

      We completely agree that membrane undulations contribute to instability of the TIRF-field and therefore disruption of cav1-GFP tracks as we discuss in the results section and have been described in previous work (Larsson et al., 2023). Yet, we have also shown that internalization of caveolae results in shorter tracks (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015). Furthermore, the tracked Cav1-GFP spots are persistently positive for cavin1 both with and without Dyngo-4a treatment showing that the majority do not disassemble become internalized by other pathways. Additionally, the added IF stainings after 30 min Dyngo-4a treatment also show that the Cav1-GFP spots remain positive for cavin1 and EHD2 just as ctrl-treated cells.

      My point here is not to discredit the data but only to suggest that the TIRF approach used is an indirect measure of caveolae scission from the membrane that requires substantiation using other approaches.

      We appreciate these comments and have tried to address these by adding new data and discussions on the interpretation of the tracking data in the results section.

      Dyngo is certainly generally affecting lipid packing via cholesterol and thereby affecting Cav1 dynamics in the plasma membrane. The claim of caveolae scission should be qualified and alternative possibilities considered and discussed. If the authors persist in arguing that dyngo is affecting caveolae scission then the effect should be substantiated by accumulation of caveolae by quantitative EM and high spatial and temporal resolution imaging of Cav1 and cavin1 to define the endocytic events. As the latter represents a new, and potentially very challenging, line of experimentation, I would suggest that it is beyond the scope of the current study. As indicated above the additional experiments are not necessary and qualification of the claims would be sufficient.

      We have now included a FRAP experiment of endocytic Cav1-GFP supporting the effect on internalization. We are also currently preforming CTxB HRP experiments to quantify the number of caveolae at the PM using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Other points

      Figure 1C - Cav1 positive spots cannot be interpreted to be caveolae from diffraction limited confocal images. Same comment applies to Fig 4G - caveola? duration.

      We completely agree with this and that the claims should be qualified. We have added IF stainings showing that the Cav1-GFP structures are also positive for cavin1. We have now clarified that we cannot distinguish between flat or different curved states of caveolae using this methodology. We have also changed the labelling of Fig. 4G.

      Figure 4C - it is not clear why this EM data is not quantified - for both the number of caveolae and clathrin coated pits - as this would help clarify the interpretation of the effect reported.

      We are currently preforming CTxB HRP experiments to quantify the number of caveolae using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long.

      Figure 4D - the AFM experiments should perhaps be repeated as the non-significant effect of dyngo on the Young's modulus may be a result of insufficient n values.

      We would like to clarify that to ensure the robustness of our AFM measurements, we performed the experiments with sufficient biological and technical replicates. Specifically, each data point shown in Figure 4D represents a Young’s modulus value averaged from approximately sixty force-distance curves per cell. For each condition, we collected force-distance maps on eight to nine individual cells, obtained from two separate petri dishes per day. We repeated this process on two independent days. In total, we analysed thirty-one cells for the DMSO control and thirty-three cells for the Dyngo-4a treatment. We performed the “student’s t-test with Welch’s correction” to access the statistical significance between the two conditions, as described in the main text. We believe that the sample size and statistical approach are sufficient to support the conclusions presented. Furthermore, we also analysed cell stiffness by calculating the slope of the linear portion of the force-distance curves. This analysis also did not reveal any statistically significant differences between the conditions (data not shown), further supporting our conclusion that Dyngo-4a treatment does not significantly alter the Young’s modulus under our experimental setup (or conditions).

      Reviewer #2 (Significance):

      This data showing that dyngo and dynasore target membrane order is quite compelling and argues that the effects of these inhibitors is not dynamin specific and that inhibition of endocytosis by these small molecule inhibitors is dynamin-independent. The in vitro and in vivo data they provide is convincing.

      Similarly, the data showing that dynasore and dyngo affect caveolin dynamics and clathrin endocytosis (transferrin) is quite convincing and argues that altered lipid packing is impacting membrane dynamics at the plasma membrane.

      What is less convincing is the conclusion is that dyngo is preventing caveolae scission from the membrane.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Larsson et al present experimental and computational data on the role of Dyngo4a (a compound that was developed to inhibit dynamin) on the dynamics of caveolae. The manuscript mostly documents effects of Dyngo on caveolae, with one experiment to suggest a mechanism for this result. This one rather unconvincing result forms the focus of the manuscript contributing to a disconnect between the data and the presentation. Additionally, there are concerns with data interpretation. The writing could also benefit from revision to address grammar mistakes, strengthen referencing, and increase precision. Overall, the manuscript requires substantial revisions before being considered for publication. The central claim, in particular, needs stronger evidence to support the proposed mechanism.

      We thank the reviewer for the thorough review and for experimental suggestions that we believe has strengthened our data further.

      Significant issues (in approximate order of importance):

      (1) The data supporting the central mechanistic explanation appears limited. There is no evidence that Dyngo remains in one leaflet

      The simulations show that the energy barrier for moving in between bilayers is very high. Furthermore, simulations of C-Laurdan has shown that it does not readily flip in between membrane leaflets (Barucha-Kraszewska et al., 2013) supporting that it reports on the outer lipid leaflet when added to cells. We have however now changed this and state that Dyngo-4a decreased the lipid order in the plasma membrane.

      - the GP of the PM is very low compared to previous measurements,

      The absolute GP-values will vary between setups depending on what filters are used so they are not comparable between laboratories. What is of importance is that we found a significant change in the relative GP-values in cells treated with Dyngo-4a and control cells. It is this change that we report. We have not performed any GP-measurements on this cell type earlier so it is unclear what previous measurements reviewer #3 are referring to.

      - effects on other membranes are not explored,

      The order of the intracellular membranes is as expected lower than that of the plasma membrane. Differentiating different intracellular membranes of interest like endocytotic vesicles from other intracellular membranes would be very difficult but, more importantly, our study is focused on what is happening in the plasma membrane where caveolae reside and would be of minor interest for plasma membrane dynamics.

      - dynamin-directed effects of Dyngo are not considered,

      In the discussion section we discuss the difficulties with disentangling dynamin-direct and indirect effects.

      (2) The QCM-D measurements and claims require explanation as several aspects remains unclear. In Fig S2, the 'softness' (what does this mean?) changes by 4-fold with DMSO alone (what does this mean?), then fractionally more with Dyngo. Then fractionally more again when Dyngo is removed (why?). Then it remains somewhat higher when both Dyngo and DMSO are removed, which is somehow interpreted as Dyngo remaining in the bilayer, but not DMSO.

      We understand the confusion of the reviewer and hope our explanations provide clarity. QCM-D measurements are based on an oscillating quartz crystal sensor. Specifically, alterations in oscillation frequency (ΔF) and the rate of energy dissipation from the sensor surface (ΔD) are what is measured. Allowing the measurement of: 1) materials adsorbing to the sensor surface, 2) changes in the viscoelastic properties of a solution in contact with the sensor surface, 3) changes in the material adsorbed to the sensor surface upone exposure to different solutions. The ratio of ΔD/-ΔF reports the mechanical softness or rigidity of an adsorbed material, in this case the SLB.

      A “buffer shift” is the term used when there is not an adsorption to the sensor surface, but rather an effect from altering the solution above the sensor surface. One reason is because different solutions can have different densities (e.g., a DMSO-buffer mixture vs buffer alone), which impacts the oscillations of the sensor. It was observed that the DMSO-buffer mixture alone gave a large buffer shift in comparison to the adsorption of the Dyngo-4a into the SLB, thereby muddling the data interpretation. Thus, in Fig. S2 the system was first equilibrated with the DMSO-buffer mixture prior to addition of the Dyngo-4a solution to allow for clearer visualization of the two events. In QCMD to assess if something has made a permeant change to the system you change back to the solutions used before the addition, thus first we washed with a DMSO-Buffer mixture followed by buffer alone. Control experiments were carried out in which no Dyngo-4a was added (also shown in Fig. S2). The control shows the same “buffer shift” from the DMSO-buffer mixture occurs in both systems and that upon returning to a buffer only condition there is no permanent change to the system caused from exposure to the DMSO. In contrast, once the system that received Dyngo-4a is changes back to a buffer only system we see that mass has been added to the system (ΔF) with little change to the dissipation (ΔD), thereby resulting in a lower ratio of ΔD/-ΔF, which is to say that the SLB after the adsorption of Dyngo-4a was more rigid that the SLB without Dyngo-4a.

      These interpretations are difficult to grasp, as the authors seem to be implying simple amphiphilic partitioning into the membrane, which should all be removable by efficient washing.

      Amphiphilic partitioning is not fully reversible by “efficient washing” it depends on partitioning coefficients.

      I do not doubt that this compound interacts with membranes, but the quantifications appear ambiguous. A bilayer with 16 mol% (or worse, 30% if all in one leaflet) Dyngo is very unlikely (to remain a bilayer). Even if such a bilayer was conceivable, the authors are claiming an ADDITION of Dyngo that would INCREASE the area of one leaflet by 30%, which needs explanation as it appears unlikely.

      We understand that in our attempt provide numbers in the results section for the amount of binding observed in QCM-D, this can easily be interpreted as this is what is observed to insert into the PM. However, as discussed in the discussion, we also see aggregations of Dyngo-4a that associate with the membrane in the simulations which likely could contribute to the binding observed in QCM-D prior to washing. The precise amount of membrane inserted Dyngo-4a is difficult to measure as we discuss in the text. In order to make this clearer, we have now moved all these details to the discussion section where we elaborate on this. Furthermore, since Dyngo-4a, like cholesterol, is intercalating in between the head groups of the lipids the area would not increase in direct proportion to the mol%.

      Also, there are no replicates shown, so unclear how reproducible these effects are?

      For clarity, only single experiments are shown. However, multiple experiments were performed and the range in measured values for 3 technical repeats can be observed in the standard deviations found in the main text (e.g., 6 ± 2 mol%).

      (3) The simulations are insufficiently described and difficult to interpret. How big are these systems? Why do the figures show the aqueous system with lateral boundaries?

      There are no explicit boundaries used in the simulations, periodic boundary conditions are applied in all three dimensions. The lateral boundaries observed in the figures correspond to the simulation box edges and are a visual artifact of 2D projections with QuickSurf representation. No artificial wall or constraints were introduced laterally. Additional technical details, including the system size and periodic boundary conditions have now been added to the methods section.

      It seems quite important that multiple Dyngo molecules aggregate rather than partition into membranes - is this likely to occur in experiment?

      Yes, this is important and with the additional simulation experiments suggested by Reviewer #3 it has been clarified that they contribute a great deal to the change in lipid packing of lipid bilayers containing cholesterol.  However, it is hard to test aggregation is the cellular system, but we believe that this happens and contribute to the effect on membranes. We have now emphasized the effect of the aggregates in the text.

      PMF simulations are strongly suggesting that Dyngo does not spontaneously cross membranes, which is inconsistent with its drug-like amphiphilicity (cLogP~2.5 is optimally suited for membrane permeation) and known effects on intracellular proteins. This suggests an artefact in these PMFs.

      As stated in the submitted version of the manuscript, logP was used to validate the topology and the observed value was in a very good agreement with cLogP. Moreover, this validation complemented the standard procedure of CHARMM-GUI ligand modelling, that provided a reasonable penalty score (around 20) for the Dyngo-4a topology. POPC and cholesterol molecules are standard in the force field and validated by numerous studies. The parameters used for the membrane simulations and AWH in particular are very common for this type of studies. Thus, we do not see what may cause any artifacts in the free energy profile construction. In fact, amphiphilicity of the molecule may be one of the key reasons that Dyngo-4a molecule remains at the aqueous interface of the membrane and does not cross the membrane spontaneously. Also, we believe that the energy barrier of 40-60 kJ/mol is not prohibitively high and Dyngo-4a molecules may still overcome the barrier eventually, though we expect majority to reside in the upper leaflet.

      The authors should experimentally measure the permeation of Dyngo through bilayers (or lack thereof), to more robustly support their finding that Dyngo does not cross membranes spontaneously.

      We thank the reviewer for the suggestion, however this if very technically challenging and would require establishment of precise systems which is beyond the scope of this manuscript.

      (4) Why not measure effect of Dyngo on lipid packing directly and more broadly in model membranes?

      With the added modelling experiments supporting the previous simulations and the calculated GP values from the C-Laurdan experiments on cellular plasma membrane, we do not find it necessary to include more model membranes experiments than the already existing ones on lipid monolayers and supported lipid bilayers.

      (5) Statistics should not be done on individual cells (n>26), but rather on independent experiment (N=3?)

      We have performed the statistics on live cell particle tracking according to previous literature on similar systems (Boucrot et al., 2011; Larsson et al., 2023; Shvets et al., 2015; Stoeber et al., 2012).

      (6) Fig 1G is important but rather unclear. Firstly, these kymographs are an odd way to show that the caveolae are not moving. More importantly, caveolae in normal cells have been shown to be quite stable and immobile (eg doi: 10.1074/jbc.M117.791400), yet here they are claimed to be very mobile.

      Although this might be an odd and unconventional way to depict dynamic processes, we believe that this is a very illustrative way to show track stability over time in bulk rather than just a kymograph over a few structures in a cell. Furthermore, we are not claiming that caveolae are very mobile but rather the opposite very stable in agreement with previous work (Boucrot et al., 2011; Larsson et al., 2023; Mohan et al., 2015). We have now edited the text to make this even clearer.

      Also, if Dyngo prevents caveolae scission, there should be more of them at the membrane - why no quantification like Fig 1C to show accumulation of caveolae upon Dyngo treatment? Or directly counting caveolae via EM, as in Fig 4C?

      We are currently preforming CTxB HRP experiments using EM but due to reasons out of our control we have not managed to finish these on time, they will be included in the manuscript once they are ready in hopefully not too long. However, Dynasore has previously been shown, by EM, to increase the number of caveolae at the PM (Moren et al., 2012; Sinha et al., 2011).

      (7) The writing can be made more precise and referencing could be strengthened.

      The introduction was written in a short format, and we have now extended this and made it more precise.

      Some examples:

      (a) 'scissoned' is not a word in English,

      Thanks, we have now changed this.

      (b) what is meant by "Cav1 assembly is driven by high chol content"? There are many types of caveolin assemblies.

      We agree that this can be made more precise and have now clarified this in the introduction.

      (c) "This generates a unique membrane domain with distinct lipid packing and a very high curvature." Unclear what 'this' refers to and there is no reference here, so what is the evidence for either of these claims? Caveolin-8S oligomers are not curved. Perhaps 'this' is caveolae, but they are relatively large and also not very highly curved and I am unaware of measurements of lipid packing therein.

      Caveolae are around 50 nm which in biology is a very high curvature of a membrane. It has been extensively proven that caveolae have a distinct lipid composition highly enriched in cholesterol and sphingolipids, which thereby also will generate a unique lipid packing as compared to the surrounding membrane. Yet, the reviewer is correct that lipid packing has not been measured in a caveola for obvious technical challenges. Thus, we have now changed the text to “special lipid composition”.

      The sentence following that one again makes a specific, but unreferenced, claim.

      (d) intro claims that lipid packing is critical for fission, but it is unclear quite what is meant by this claim. The references do not help, as they are often about the basic biophysics of lipids, rather than how packing affects fission.

      We have now edited the text.  

      (e) intro strongly implies that caveolae remain membrane attached because of stalled scission. How strong is the evidence for this? The fact that EHD2 is at the neck is not definitive,

      We used the term stalled scission to describe that all omega shaped membrane invaginations do not scission in the same automatic way as clathrin coated vesicles. We have now changed this in the text. Caveolae are shown to be released (undergo scission) and be detected as internal caveolae if the protein EHD2 is removed. Hence this must be interpreted as if EHD2 stalls scission. The evidence includes data compiled over the last 12 years from others and us which include for example: 1) Caveolae with EHD2 have a longer duration time (Larsson et al., 2023; Mohan et al., 2015; Moren et al., 2012; Stoeber et al., 2012), Knock down of EHD2 results in more internalized caveolae as measured by CTxB HRP using EM (Moren et al., 2012) and shorter duration time at the PM (Hubert et al., 2020; Larsson et al., 2023; Mohan et al., 2015; Stoeber et al., 2012). 2) EHD2 overexpression results in less internalized caveolae as measured by CTxB HRP using EM (Stoeber et al., 2012). Furthermore, 3) overexpression or acute addition of purified EHD2 via microinjection counteracts lipid induced scission of caveolae and hence result in caveolae stabilization at the PM (Hubert et al., 2020). It is very hard to see that the release and internalization of caveolae could result from anything else than that these have undergone scission. EHD2 has been found around the rim of caveolae (Matthaeus et al., 2022) and overexpression of EHD2 oligomerizing mutants have been shown to expand the caveola neck (Hoernke et al., 2017; Larsson et al., 2023).

      (f) unclear what is meant by 'lipid packing frustration' and how Dyngo supposedly induces it.

      Lipid packing frustration refers to what is usually referred to as lipid packing defect, but since lipid membranes are describe as a fluid system it should not have defects whereby, we believe that lipid packing frustration is more accurate. However, we have now changed the text and use “decreased lipid packing” or “decreased lipid order” more thoroughly to describe the effect on the plasma membrane.

      (8) IF of Cav1 is insufficient to claim puncta as caveolae. Co-stained puncta of caveolin with cavin are much stronger evidence. Same issue for Cav1-GFP puncta.

      We agree and have now provided IF showing cavin1 and EHD2 colocalization to Cav1GFP in non and Dyngo-4a-treated cells.

      (9) Fig 3E claims that "preferred position of Dyngo-4a was closer to the head groups" but the minimum looks to be in similar place as Fig 3B without cholesterol. Response:

      We appreciate the reviewer’s observation. The PMF minima in the POPC and POPC:Chol membranes are indeed close in absolute position (~1.1–1.2 nm from the bilayer center). However, as clarified in the revised text, the presence of cholesterol leads to a slight shift of Dyngo-4a closer to the headgroup region and broadens the positional distribution. This is also evident from the added density profiles (Fig. S3A) and is now described more precisely in the manuscript.

      Critically, these results do not support the notion that Dyngo affects lipid packing sufficiently, which is not measured in the simulations (though could be).

      We thank the reviewer for the excellent suggestion. In response, we have now included a detailed analysis of Dyngo-4a’s effect on lipid packing in the simulations. As described in the revised manuscript, we measured deuterium order parameters, area per lipid (APL), and lipid–Dyngo–cholesterol spatial distributions (Figs. 3-H, S3C-E). The results demonstrate that Dyngo-4a decreases lipid order in POPC:Chol membranes. Both single molecules and clusters reduce the order parameter by up to 0.04 units, particularly in the upper leaflet, where Dyngo-4a reside.The reduction is most pronounced in the midchain region of the sn1 tail and around the double bond of the sn2 tail. These effects were accompanied by increased APL in POPC:Chol membranes and by colocalization of Dyngo-4a near cholesterol-rich regions. Together, these data confirm that Dyngo-4a perturbs membrane organization and lipid packing in a composition-dependent manner. We believe these additions directly address the concern and demonstrate that the simulations indeed support the conclusion that Dyngo-4a modulates lipid packing.

      Finally, the simulation data do not show "that Dyngo-4a is competing with cholesterol"; it is unclear what 'competition' means in this context, but regardless, the data only shows that Dyngo sits at a similar location as cholesterol.

      We agree with the reviewer that “competition” was an imprecise term. We have rephrased the relevant sections to clarify that Dyngo-4a and cholesterol localize to overlapping regions and exhibit spatial coordination. As now stated in the manuscript, cholesterol appears to partially displace Dyngo-4a from its preferred depth seen in pure POPC, broadens its membrane distribution, and alters lipid packing. According to the order parameters there is an interplay between chol and Dyngo-4a and the heatmaps show that the distribution of chol in the membrane gets less uniform in the presence of Dyngo-4a. These interactions suggest that Dyngo-4a perturbs cholesterol-rich domains.

      As new analysis routines were added to the study, we have now also added the details on those to the Methods section of the text.

      (10) AFM measures the stiffness of the cell (as correctly explained in Results section) not "overall stiffness of the PM" as stated in the Discussion.

      We thank the reviewer for pointing this out, we have now altered this in the discussion section.

      (11) Fig2A: what was the starting lipid surface pressure? How does Dyngo insertion depend on initial lipid packing?

      The starting pressure lipid pressure was 20 mN m<sup>-1</sup which we now have incorporated in the figure legend. We performed several such experiments with a starting pressure ranging from 20-23 mN m<sup>-1</sup> showing consistent results which we described in the materials and methods section. Given that we also performed QCMD analysis and simulations on bilayers showing that Dyngo-4a adsorbed and inserted respectively, we have not performed a titration of starting pressures resulting in a MIP of Dygo-4a.

      (12) Fig 4B is a strange approach to measure membrane motion. Why not RMSD or some other displacement based method? As its shown, it implies that the area of the cell changes.

      The method that we used to quantify the area of the cell which is attached (or close to) the glass and thereby is visible in TIRF microscopy. This is area indeed changes over time which has been frequently observed and used to describe and quantify the mobility, lamellipodia and filopodia formation among other things. We agree that RMSD can also be used to analyze the data before and after treatments and we have now included RMSD­­­­ analysis in the manuscript.

      Reviewer #3 (Significance):

      The title, abstract, and introduction of the manuscript are largely framed around lipid packing, but most of the data investigate other unexpected effects of treating cells with Dyngo4a. The only measurement for lipid packing (or any other membrane properties) is Fig 4E-F. Therefore, this paper is effectively an investigation of an artefact of a common reagent, which itself could be a valuable contribution. However, the mechanism to explain its effect requires stronger evidence, and its broad biological significance needs further exploration.

      Overall, the impact of documenting the effects of Dyngo4a on membranes appears modest but may be valuable to the membrane trafficking community.

      Barucha-Kraszewska, J., S. Kraszewski, and C. Ramseyer. 2013. Will C-Laurdan dethrone Laurdan in fluorescent solvent relaxation techniques for lipid membrane studies? Langmuir. 29:1174-1182.

      Boucrot, E., M.T. Howes, T. Kirchhausen, and R.G. Parton. 2011. Redistribution of caveolae during mitosis. J Cell Sci. 124:1965-1972.

      Hoernke, M., J. Mohan, E. Larsson, J. Blomberg, D. Kahra, S. Westenhoff, C. Schwieger, and R. Lundmark. 2017. EHD2 restrains dynamics of caveolae by an ATP-dependent, membrane-bound, open conformation. Proc Natl Acad Sci U S A. 114:E4360-E4369.

      Hubert, M., E. Larsson, N.V.G. Vegesna, M. Ahnlund, A.I. Johansson, L.W. Moodie, and R. Lundmark. 2020. Lipid accumulation controls the balance between surface connection and scission of caveolae. Elife. 9.

      Larsson, E., B. Moren, K.A. McMahon, R.G. Parton, and R. Lundmark. 2023. Dynamin2 functions as an accessory protein to reduce the rate of caveola internalization. J Cell Biol. 222.

      Matthaeus, C., K.A. Sochacki, A.M. Dickey, D. Puchkov, V. Haucke, M. Lehmann, and J.W. Taraska. 2022. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 13:7234.

      McCluskey, A., J.A. Daniel, G. Hadzic, N. Chau, E.L. Clayton, A. Mariana, A. Whiting, N.N. Gorgani, J. Lloyd, A. Quan, L. Moshkanbaryans, S. Krishnan, S. Perera, M. Chircop, L. von Kleist, A.B. McGeachie, M.T. Howes, R.G. Parton, M. Campbell, J.A. Sakoff, X. Wang, J.Y. Sun, M.J. Robertson, F.M. Deane, T.H. Nguyen, F.A. Meunier, M.A. Cousin, and P.J. Robinson. 2013. Building a better dynasore: the dyngo compounds potently inhibit dynamin and endocytosis. Traffic. 14:1272-1289.

      Mohan, J., B. Moren, E. Larsson, M.R. Holst, and R. Lundmark. 2015. Cavin3 interacts with cavin1 and caveolin1 to increase surface dynamics of caveolae. J Cell Sci. 128:979-991.

      Moren, B., C. Shah, M.T. Howes, N.L. Schieber, H.T. McMahon, R.G. Parton, O. Daumke, and R. Lundmark. 2012. EHD2 regulates caveolar dynamics via ATP-driven targeting and oligomerization. Mol Biol Cell. 23:1316-1329.

      Shvets, E., V. Bitsikas, G. Howard, C.G. Hansen, and B.J. Nichols. 2015. Dynamic caveolae exclude bulk membrane proteins and are required for sorting of excess glycosphingolipids. Nat Commun. 6:6867.

      Sinha, B., D. Koster, R. Ruez, P. Gonnord, M. Bastiani, D. Abankwa, R.V. Stan, G. Butler-Browne, B. Vedie, L. Johannes, N. Morone, R.G. Parton, G. Raposo, P. Sens, C. Lamaze, and P. Nassoy. 2011. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 144:402-413.

      Stoeber, M., I.K. Stoeck, C. Hanni, C.K. Bleck, G. Balistreri, and A. Helenius. 2012. Oligomers of the ATPase EHD2 confine caveolae to the plasma membrane through association with actin. EMBO J. 31:2350-2364.

    1. She hits herself in the head, hard with her knuckles, until her forehead has these indentations and bruises. That happens a lot. She goes to get the docbot to clean her up.

      could be her stimming

    2. She hits herself in the head, hard with her knuckles, until her forehead has these indentations and bruises. That happens a lot. She goes to get the docbot to clean her up.

      This also happened. Maybe some kind of tick to stop thinking?

    1. shook my head. "Their restraints, their disease, the ward, their bodies .. .

      The people with this disease struggle with the thought of being in their own body

    2. My mother started to drift when I was three," he said. "My father only lasted a fewmonths longer. I heard he died a couple of years after he went into the hospital. If thetwo of them had had any sense, they would have had me aborted the minute mymother realized she was pregnant. But she wanted a kid no matter what. And she wasCatholic." He shook his head. "Hell, they should pass a law to sterilize the lot of us."

      The narrator knows now that his mother and father were both disabled. He even brings up the thought of sterilizing people with the same disability

    1. Joseph leans over me, places his hand on the crown of my head and presses his lips to the spot above my eyes

      This shows a deep and emotional connection between the hound and Joseph. It's a sense of affection.

    1. While at first this way of being appears radically different from our own, the idea “We are all lichens” echoes through enlichenment thought.5 Recognizing that there are no individuals is a critical ecological insight made glaring by our global environmental crisis, and it must become central to our understanding of who we are in the world if we have any hopes of shifting the dangerous individualism that has led us here.

      re: d. griffiths "queer theory for lichens"; we are all lichens. symbiotic, mosaic, slow-growing. on a body level, we are tapestries of bacteria from head to toe. on a social level, the humans are the bacterium.

    1. Reviewer #2 (Public review):

      Summary:

      This paper presents a novel transformer-based neural network model, termed the epistatic transformer, designed to isolate and quantify higher-order epistasis in protein sequence-function relationships. By modifying the multi-head attention architecture, the authors claim they can precisely control the order of specific epistatic interactions captured by the model. The approach is applied to both simulated data and ten diverse experimental deep mutational scanning (DMS) datasets, including full-length proteins. The authors argue that higher-order epistasis, although often modest in global contribution, plays critical roles in extrapolation and capturing distant genotypic effects, especially in multi-peak fitness landscapes.

      Strengths:

      (1) The study tackles a long-standing question in molecular evolution and protein engineering: "how significant are epistatic interactions beyond pairwise effects?" The question is relevant given the growing availability of large-scale DMS datasets and increasing reliance on machine learning in protein design.

      (2) The manuscript includes both simulation and real-data experiments, as well as extrapolation tasks (e.g., predicting distant genotypes, cross-ortholog transfer). These well-rounded evaluations demonstrate robustness and applicability.

      (3) The code is made available for reproducibility.

      Weaknesses:

      (1) The paper mainly compares its transformer models to additive models and occasionally to linear pairwise interaction models. However, other strong baselines exist. For example, the authors should compare baseline methods such as "DANGO: Predicting higher-order genetic interactions". There are many works related to pairwise interaction detection, such as: "Detecting statistical interactions from neural network weights", "shapiq: Shapley interactions for machine learning", and "Error-controlled non-additive interaction discovery in machine learning models".

      (2) While the transformer architecture is cleverly adapted, the claim that it allows for "explicit control" and "interpretability" over interaction order may be overstated. Although the 2^M scaling with MHA layers is shown empirically, the actual biological interactions captured by the attention mechanism remain opaque. A deeper analysis of learned attention maps or embedding similarities (e.g., visualizations, site-specific interaction clusters) could substantiate claims about interpretability.

      (3) The distinction between nonspecific (global) and specific epistasis is central to the modeling framework, yet it remains conceptually underdeveloped. While a sigmoid function is used to model global effects, it's unclear to what extent this functional form suffices. The authors should justify this choice more rigorously or at least acknowledge its limitations and potential implications.

      (4) The manuscript refers to "pairwise", "3-4-way", and ">4-way" interactions without always clearly defining the boundaries of these groupings or how exactly the order is inferred from transformer layer depth. This can be confusing to readers unfamiliar with the architecture or with statistical definitions of interaction order. The authors should clarify terminology consistently. Including a visual mapping or table linking a number of layers to the maximum modeled interaction order could be helpful.

    1. axial (muscles of the trunk and head) and appendicular (muscles of the arms and legs) categories.

      Skeletal muscles divided into 1. Axial muscles 2. Appendicular muscles

    1. So our concepts operate as a system of representation, but wehaven’t finished the circle yet because, supposed we all shared the sameconceptual map, that’s to say we made sense of the world in roughly the samesystem of classification in our head

      I agree with Hall, just having shared categories doesn’t mean we all see things the same way. Media can use the same labels but twist the meaning depending on context. That is why representation is more than just naming, it is about how those names are used.

    2. Okay, one could say then that the conceptual maps in our heads, which allowus to come to a sense of what is going on in the world

      The one problem I find with this section is that every head holds a different map of ideas and understandings, or, as the popular saying goes, "every head is a world".

    1. Therefore, differentiated instruction is often referred to as responsive teaching that adjusts instruction based on ongoing assessment of students’ needs.

      This annotation is mainly for the video. I can understand how having students write about their favorite football team in an English class and drawing about the specific topic that is being discussed, however, how would you be able to assess if a student grasps the content knowledge in a social studies classroom? If I'm teaching about the American Civil War, but Tommy likes to talk about basketball and he is disengaged with his head in his arms on the desk. How can I tell him to write an essay about something that isn't related to the topic? Wouldn't that be reflected during the summative assessments?

    1. Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays. This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Weakness:

      All prior concerns have been addressed in the revised manuscript. The added 'Limitations of the study' section is a welcome and important clarification. Despite these limitations, the study provides valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

    2. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This work addresses an important question in the field of Drosophila aggression and mating. Prior social isolation is known to increase aggression in males, manifesting as increased lunging, which is suppressed by group housing (GH). However, it is also known that single housed (SH) males, despite their higher attempts to court females, are less successful. Here, Gao et al., develop a modified aggression assay to address this issue by recording aggression in Drosophila males for 2 hours, with a virgin female immobilized by burying its head in the food. They found that while SH males frequently lunge in this assay, GH males switch to higher intensity but very low frequency tussling. Constitutive neuronal silencing and activation experiments implicate cVA sensing Or67d neurons in promoting high frequency lunging, similar to earlier studies, whereas Or47b neurons promote low frequency but higher intensity tussling. Optogenetic activation revealed that three pairs of pC1SS2 neurons increase tussling. Cell-type-specific DsxM manipulations combined with morphological analysis of pC1SS2 neurons and side-by-side tussling quantification link the developmental role of DsxM to the functional output of these aggression-promoting cells. In contrast, although optogenetic activation of P1a neurons in the dark did not increase tussling, thermogenetic activation under visible light drove aggressive tussling. Using a further modified aggression assay, GH males exhibit increased tussling and maintain territorial control, which could contribute to a mating advantage over SH males, although direct measures of reproductive success are still needed.

      Strengths:

      Through a series of clever neurogenetic and behavioral approaches, the authors implicate specific subsets of ORNs and pC1 neurons in promoting distinct forms of aggressive behavior, particularly tussling. They have devised a refined territorial control paradigm, which appears more robust than earlier assays using a food cup (Chen et al., 2002). This new setup is relatively clutter-free and could be amenable to future automation using computer vision approaches. The updated Figure 5, which combines cell-type-specific developmental manipulation of pC1SS2 neurons with behavioral output, provides a link between developmental mechanisms and functional aggression circuits. The manuscript is generally well written, and the claims are largely supported by the data.

      Thank you for the precise summary of the manuscript and acknowledgment of the novelty and significance of the study.

      Weakness:

      Although most concerns have been addressed, the manuscript still lacks a rigorous, objective method for quantifying lunging and tussling. Because scoring appears to have been done manually and a single lunge in a 30 fps video spans only 2-3 frames, the 0.2 s cutoff seems arbitrary, and there are no objective criteria distinguishing reciprocal lunging from tussling. Despite this, the study offers valuable insights into the neural and behavioral mechanisms of Drosophila aggression.

      Thank you for this comment. The duration of each lunge was measured by analyzing the videos frame by frame—from the frame before the initiation of the lunge to the frame after its completion—resulting in an average span of 3–5 frames. Given a frame rate of 30 fps, this corresponds to approximately 0.1–0.17 seconds. We acknowledge that there are certain limitations for manually quantifying the two types of aggressive behaviors, which has now been stated in the newly added “Limitations of the Study” section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Gao et al. investigated the change of aggression strategies by the social experience and its biological significance by using Drosophila. Two modes of inter-male aggression in Drosophila are known: lunging, high-frequency but weak mode, and tussling, low-frequency but more vigorous mode. Previous studies have mainly focused on the lunging. In this paper, the authors developed a new behavioral experiment system for observing tussling behavior and found that tussling is enhanced by group rearing, while lunging is suppressed. They then searched for neurons involved in the generation of tussling. Although olfactory receptors named Or67d and Or65a have previously been reported to function in the control of lunging, the authors found that these neurons do not function in the execution of tussling and another olfactory receptor, Or47b, is required for tussling, as shown by the inhibition of neuronal activity and the gene knockdown experiments. Further optogenetic experiments identified a small number of central neurons pC1[SS2] that induce the tussling specifically. These neurons express doublesex (dsx), a sex-determination factor, and knockdown of dsx strongly suppresses the induction of tussling. In order to further explore the ecological significance of the aggression mode change in group-rearing, a new behavioral experiment was performed to examine the territorial control and the mating competition. And finally, the authors found that differences in the social experience (group vs. solitary rearing) and the associated change in aggression strategy are important in these biologically significant competitions. These results add a new perspective to the study of aggression behavior in Drosophila. Furthermore, this study proposes an interesting general model in which the social experience modified behavioral changes play a role in reproductive success.

      Strengths:

      A behavioral experiment system that allows stable observation of tussling, which could not be easily analyzed due to its low-frequency, would be very useful. The experimental setup itself is relatively simple, just the addition of a female to the platform, so it should be applicable to future research. The finding about the relationship between the social experience and the aggression mode change is quite novel. Although the intensity of aggression changes with the social experience was already reported in several papers (Liu et al., 2011 etc), the fact that the behavioral mode itself changes significantly has rarely been addressed, and is extremely interesting. The identification of sensory and central neurons required for the tussling makes appropriate use of the genetic tools and the results are clear. A major strength of this study in neurobiology is the finding that another group of neurons (Or47b-expressing olfactory neurons and pC1[SS2] neurons), distinct from the group of neurons previously thought to be involved in low-intensity aggression (i.e. lunging), function in the tussling behavior. Furthermore, the results showing that the regulation of aggression by pC1[SS2] neurons is based on the function of the dsx gene will bring a new perspective to the field. Further investigation of the detailed circuit analysis is expected to elucidate the neural substrate of the conflict between the two aggression modes. The experimental systems examining the territory control and the reproductive competition in Fig. 6 are novel and have advantages in exploring their biological significance. It is important to note that in addition to showing the effects of age and social experience on territorial and mating behaviors, the authors experimentally demonstrated that altered fighting strategy has effects with respect to these behaviors.

      Thank you for your precise summary of our study and being very positive on the novelty and significance of the study.

      Reviewer #3 (Public review):

      In this revised manuscript, Gao et al. presented a series of well-controlled behavioral data showing that tussling, a form of high-intensity fighting among male fruit flies (Drosophila melanogaster) is enhanced specifically among socially experienced and relatively old males. Moreover, results of behavioral assays led authors to suggest that increased tussling among socially experienced males may increase mating success. They also concluded that tussling is controlled by a class of olfactory sensory neurons and sexually dimorphic central neurons that are distinct from pathways known to control lunges, a common male-type attack behavior.

      A major strength of this work is that it is the first attempt to characterize behavioral function and neural circuit associated with Drosophila tussling. Many animal species use both low-intensity and high-intensity tactics to resolve conflicts. High-intensity tactics are mostly reserved for escalated fights, which are relatively rare. Because of this, tussling in the flies, like high-intensity fights in other animal species, have not been systematically investigated. Previous studies on fly aggressive behavior have often used socially isolated, relatively young flies within a short observation duration. Their discovery that 1) older (14-days old) flies tend to tussle more often than younger (2 to 7-days-old) flies, 2) group-reared flies tend to tussle more often than socially isolated flies, and 3) flies tend to tussle at later stage (mostly ~15 minutes after the onset of fighting), are the result of their creativity to look outside of conventional experimental settings. These new findings are key for quantitatively characterizing this interesting yet under-studied behavior.

      Newly presented data have made several conclusions convincing. Detailed descriptions of methods to quantify behaviors help understand the basis of their claims by improving transparency. However, I remain concerned about authors' persistent attempt to link the high intensity aggression to reproductive success. The authors' effort to "tone down" the link between the two phenomena remains insufficient. There are purely correlational. I reiterate this issue because the overall value of the manuscript would not change with or without this claim.

      Thank you for acknowledging the novelty and significance of the study. Regarding the relationship you mentioned between high-intensity aggression and reproductive success, we further toned down the statement between them throughout the manuscript in the revised manuscript. We also modified the title to “Social Experience Shapes Fighting Strategies in Drosophila”. In addition, we now added a ‘Limitations of the Study’ section to clearly state the correlation between tussling and reproductive success.

      Reviewer #1 (Recommendations for the authors):

      If possible, mention the EM-connectome data showing the minimal interneuronal path from Or47b ORNs to pC1SS2 neurons (even if derived from the female connectome), which can strengthen the model of parallel sensory-central pathways.

      Thank you for this comment. According to data from the EM connectome, connecting Or47b ORNs to pC1d neurons requires at least two intermediate neurons. An example minimal pathway is: ORN_VA1v (L) → AL-AST1 (L) → PLP245 (L) → pC1d (R). We have added this point in the Discussion section of the revised manuscript.

      I'm not convinced that labeling lunges as "gentle" combat behavior works, either in the abstract or elsewhere. While lunging is indeed a lower-intensity form of aggression compared to tussling, applying anthropomorphic descriptors risks misleading readers.

      Thank you for this comment. We now use “low-intensity” instead of “gentle” to describe lunging.

      In Materials & Methods, please cross-check all figure-panel references after the recent re-numbering (e.g. "Figure 5A6A" etc.).

      Thank you for this comment. We have thoroughly verified the figure panel references in the Materials & Methods section.

      Ensure that Table S1 is clearly cited in the main text where you first describe fly genotypes.

      Thank you for this comment. We have now cited Table S1 in the main text.

      There are multiple grammatical errors and typos throughout the manuscript. Please correct them. Some examples are below, but this is not an exhaustive list:

      Line 98-102 requires rephrasing as the results are already published and not being observed by the authors.

      Thank you for this comment. We have revised the manuscript to “we occasionally observed the high-intensity boxing and tussling behavior in male flies as previously reported (Chen et al., 2002; Nilsen et al., 2004), which….”

      line 116- lower not 'lowed'.

      Corrected.

      line 942 & 945- knock-down males not 'knocking down males'.

      Corrected. Thank you very much for these comments.

      Reviewer #2 (Recommendations for the authors):

      The authors have almost completely answered the major comments I have noted on the ver.1 manuscript: (1) They clearly show changes in fighting strategy in the territory control behavior experiment in Fig. 6-figure supplements. (2) A detailed description of how aggressive behavior is measured. Thus, I am convinced by this revision.

      Thank you for these comments that make the manuscript a better version.

      Furthermore, in Fig. 5, which examined the relationship of pC1[SS2] characteristics with the function of dsx, is a novel data and very interesting. I look forward to further developments.

      Thank you. We will continue to explore this part in our future study.

      However, one point still concerns me.

      Line 192: Although the authors describe it as "usage-dependent," the trans-Tango technique is essentially a postsynaptic cell-labeling technique. It is possible that the labeling intensity in postsynaptic cells increases from the change in expression levels of the Or47b gene due to GH. However, there is no difference in the expression level of the Or47b gene labeled by GFP between SH and GH. Therefore, we cannot conclude that the expression of the Or47b gene is increased by rearing conditions.

      The original paper on trans-TANGO (Talay et al., 2017) does not discuss the usage-dependency. A review of trans-synaptic labeling techniques (Ni, Front Neural Circuits. 2021) discusses that the increase in trans-TANGO signaling with aging may be related to synaptic strength, but there is no experimental evidence for this. In my opinion, the results in Figure 3-figure supplement 2 only weakly suggest that the increase in trans-TANGO signaling may be explained by an increase in synaptic strength due to group rearing.

      We appreciate the reviewer’s insightful comment regarding the interpretation of the trans-Tango signal. Indeed, the original trans-Tango study (Talay et al., 2017) does not claim that the method is usage-dependent. The observed increase in trans-Tango labeling with age, as reported in their supplemental figures, may reflect accumulation over time, potentially influenced by synaptic maturation or increased component expression. To avoid overstating our results, we have revised the relevant statement in the manuscript to remove the term "usage-dependent" and now describe the change in trans-Tango signal more cautiously.  

      Reviewer #3 (Recommendations for the authors):

      Below are the cases where their professed attempts to "tone down the statement" appear ignored:

      Lines 27-29:

      "Our findings... suggest how social experience shapes fighting strategies to optimize reproductive success".

      We have now revised the manuscript to “Our findings… suggest that social experience may shape fighting strategies to optimize reproductive success.”

      Lines 85-86:

      "... discover that this infrequent yet intense form of combat is... crucial for territory dominance and mating competition".

      We have now revised the manuscript to “…discover that this infrequent yet intense form of combat is enhanced by social enrichment, while the low-intensity lunging is suppressed by social enrichment.” 

      Lines 335-339:

      "Here, we found that... GH males tend to... increase the high-intensity tussling, which enhances their territorial and mating competition."

      We have removed “which enhances their territorial and mating competition” in the revised manuscript.

      Lines 343-344:

      "... presenting a paradox between social experience, aggression and reproductive success. Our result resolved this paradox..."

      We have now revised the manuscript to “...Our results provide an explanation for this paradox…”

      Lines 355-358:

      "Interestingly, we found that the mating advantage gained through social enrichment can even offset the mating disadvantage associated with aging, further supporting the vital role of shifting fighting strategies in experienced, aged males."

      We have removed “further supporting the vital role of shifting fighting strategies in experienced, aged males” in the revised manuscript.

      Lines 361-362:

      "These results separate the function of the two fighting forms and rectify out understanding of how social experiences regulate aggression and reproductive success."

      We have removed this sentence in the revised manuscript.

      Some may say that a speculative statement is harmless, but I think it indeed is harmful unless it is clearly indicated as a speculation. It is regrettable that authors remain reluctant to change their claim without providing any new supporting evidence. All three reviewers raised the same concern in the first round of review.

      We apologize for not making the speculative nature of the statement clearer in the previous version. In the revised manuscript, we have now explicitly rephrased sentences to only suggest a correlation but not a causal link between tussling and reproductive success.

      I have no choice but to keep my evaluation of the manuscript as "Incomplete" unless the authors thoroughly eliminate any attempt to link these two. This must go beyond changing a few words in the lines listed above.

      Thank you for this comment. In addition to the lines listed above, we carefully checked all statements regarding the correlation between fighting strategies and reproductive success throughout the full text. Furthermore, we have also added a “Limitations of the Study” section to address the shortcomings of this study in the revised manuscript.

      I do not have the same level of concern over the interpretation of Fig. 6A-C, because this is directly linked to aggressive interactions. Even if the socially isolated males do not engage in tussling, it is not a leap to assume that a different fighting tactic of socially experienced males can give them an advantage in defending a territory. To me, this is a sufficient ethological link with the observed behavioral change.

      Thank you for this insightful comment.

      The following are relatively minor, although important, concerns.

      I beg to differ over the authors' definition of "tussling". Supplemental movies S1 and S2 appear to include "tussling" bouts in which 2 flies lunging at each other in rapid succession, and supplemental movie S3 appears to include bouts of "holding", in which one fly holds the opponent's wings and shakes vigorously. These cases suggest that the definition of "tussling" as opposed to "lunging" has a subjective element. However, I would not delve on this matter further because it is impossible to be completely objective over behavioral classification, even by using a computational method. An important point is that the definition is applied consistently within the publication. I have no reason to doubt that this was not the case.

      Thank you for this comment. Since the analysis of tussling behavior was conducted manually, it is challenging to achieve complete objectivity. However, we made every effort to apply consistent criteria throughout the analysis. We have added a “Limitations of the Study” section in the revised manuscript to clearly state this caveat. We appreciate your understanding.

      Authors now state that "all tester flies were loaded by cold anesthesia" (lines 432-433). I would like to draw attention to the well-known fact that anesthesia, whether by ice or by CO2, are long known to affect fly's subsequent behaviors (for aggression, see Trannoy S. et al., Learn. Mem. 2015. 22: 64-68). It will be prudent to acknowledge the possibility that this handling method could have contributed to unusually high levels of spontaneous tussling, which has not been reported elsewhere before.

      Thank you for this comment. The increased tussling behavior observed in our study is unlikely due to cold anesthesia, as noted by Trannoy S. et al. (2015), cold anesthesia profoundly reduces locomotion and general aggressiveness in flies. We acknowledge that the use of cold anesthesia in behavioral experiments may have potential effects on aggression. To minimize this influence, we allowed the flies to recover and adapt for at least 30 minutes before behavioral recording. Moreover, both control and experimental groups were treated in exactly the same manner to ensure consistency.

      It is intriguing that pC1SS2 neurons are dsx+ but fru-. Authors convincingly demonstrated that these neurons are clearly distinct from the P1a neurons, a well-characterized hub for male social behaviors. It is possible that pC1SS2 neurons overlap with previously characterized dsx+ neurons that are important for male aggressions (measured by lunges), such as in Koganezawa et al., Curr. Biol. 2016 and Chiu et al., Cell 2020, a point authors could have explicitly raised.

      Thank you for this comment. We have added this point into the Discussion section of the revised manuscript, as follows: “That tussling-promoting… aggression (Koganezawa et al., 2016). Moreover, the anatomical features of pC1<sup>SS2</sup> neurons are highly similar to the male-specific aggression-promoting (MAP) neurons identified by another previous study (Chiu et al., 2021).

      I acknowledge the authors' courage to initiate an investigation to a less characterized, high intensity fighting behavior. Tussling requires the simultaneous engagement of two flies. Even if there are confusion over the distinction between lunges and tussling, authors' conclusion that socially experienced flies and socially isolated flies employ distinct fighting strategy is convincing. The concern I raised above is about the interpretation of the data, not about the quality of data.

      Thank you for your constructive comments to make this manuscript better.

    1. It also takes the idea of constructingin the head more seriously by recognizing more than one kind ofconstruction (some of them as far removed from simple buildingas cultivating a garden), and by asking questions about the meth-ods and the materials used.

      The material needs to be questioned sometimes because information is often being modified or changed to instruct a new way and the knowledge can get lost.

    1. "Perhaps now it<br /> would be better to give up seeking for the<br /> truth, and receiving on one’s head an<br /> avalanche of opinion hot as lava, discoloured<br /> as dish-water." This use of these simile's paint an angry & vivid image

    1. Our MissionWe provide comprehensive transfer pricing benchmarking solutions that help multinational enterprises ensure compliance with international tax regulations while optimizing their global tax strategies.Our platform combines advanced statistical analysis with regulatory expertise to deliver reliable, defensible benchmarking studies that meet the highest standards of tax authorities worldwide.Key FeaturesOECD Compliant AnalysisMultiple Transfer Pricing MethodsComprehensive DocumentationExpert Support Team

      Specialized Transfer Pricing Expertise At Equvira Hungary Kft., transfer pricing is not just one of many services — it is our core focus. Our team has extensive experience in preparing OECD-compliant documentation, benchmarking studies, and complete TP analyses for businesses operating across industries.

      Unique Experience Our Head of Transfer Pricing brings 15 years of hands-on audit experience at the Hungarian tax authority as a transfer pricing inspector. This insider knowledge gives us a unique perspective on how tax authorities evaluate compliance, allowing us to prepare documentation that stands up to real-world scrutiny.

      Efficient & Reliable Delivery Whether you need a focused benchmarking study or a full TP documentation package, we adapt to your timeline and ensure that all deliverables are accurate, compliant, and practical for your business.

      Comprehensive Approach We draw on multiple global databases and robust statistical methods, ensuring that every analysis is grounded in reliable data and internationally accepted methodology.

      Quality You Can Trust Every report undergoes expert review and strict quality control. At Equvira Hungary Kft., we take responsibility for the accuracy and completeness of our work — because our clients’ compliance and peace of mind come first. ennek is lehet adni valami kis kártyás formát vagy ilyesmi

    1. Why Choose Our Database Service?Expert Database SpecialistsOur team includes certified database specialists with years of experience in transfer pricing research. We know how to construct complex queries that deliver the most relevant results for your analysis.Fast Turnaround TimesStandard delivery within 24 hours, with rush options available for urgent requests. Our streamlined process ensures you get your data when you need it most.Comprehensive Data AccessAccess to the complete Orbis database with 400+ million companies worldwide. We don't limit your search - get all the data you need for robust benchmarking studies.Quality GuaranteeEvery search is reviewed for accuracy and completeness before delivery. We guarantee the quality of our results or you don't pay - it's that simple.

      Why Choose Our Services?

      Unique Expertise

      Our Head of Transfer Pricing has 15 years of audit experience at the Hungarian tax authority as a transfer pricing inspector. This unparalleled background ensures that every analysis and documentation we deliver is fully aligned with how tax authorities evaluate transfer pricing compliance in practice.

      Efficient Turnaround From targeted benchmarking to full documentation, our streamlined process ensures efficient delivery tailored to your project’s scope.

      Comprehensive Data & Analysis We leverage multiple trusted global databases and robust economic methods to ensure accurate and comparable results.

      Quality Assurance Every deliverable undergoes expert review and compliance checks. We stand behind the quality of our documentation with a clear commitment to reliability.

    1. Students are often encouraged to annotate while reading.1

      Even though encouraged, I prefer not to annotate because I am not the person who finds joy in explaining or writing things. I despise writing by hand because I get cramps in my hand so often, therefore I prefer to just read something and remember in my head key points. Also due to me being very conservative of my time and going out of the way of reading to highlight a point is consuming to me.

    1. or years now I have been in love with a language other than the English in which I write, and it is a rough affair. Every day I try to learn a little more Ojibwe. I have taken to carrying verb conjugation charts in my purse, along with the tiny notebook I've always kept for jotting down book ideas, overheard conversations, language detritus, phrases that pop into my head. Now that little notebook includes an increasing volume of Ojibwe words.

      I think this is a great introduction on how Erdrich studied in her past time, just day to day she would pick up new words.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the study's main conclusion regarding experience-driven changes in functional connectivity profiles between visual and frontal regions.

      In general, the findings in sighted adult and congenitally blind groups replicate previous studies and enhance the confidence in the reliability and robustness of the current results.

      Split-half analysis provides a good measure of robustness in the infant data.

      Weaknesses:

      There is some ambiguity in determining which aspects of these networks are shaped by experience.

      This uncertainty is compounded by notable differences in data acquisition and preprocessing methods, which could result in varying signal quality across groups. Variations in signal quality may, in turn, have an impact on the observed correlation patterns.

      The study's findings could benefit from being situated within a broader debate surrounding the instructive versus permissive roles of experience in the development of visual circuits.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. explore the developmental organs of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. In this paper, Tian et al. ask: how does this organization arise over development? Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated; some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults. 

      Strengths:

      The question raised in this paper is extremely important: what is the starting state in development for visual cortical regions, and how is this organization shaped by experience? This paper is among the first to examine this question, particularly by comparing infants not only with sighted adults but also blind adults, which sheds new light on the role of visual (and cross-modal) experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data. 

      Weaknesses:

      A central claim is that "infant secondary visual cortices functionally resemble those of blind more than sighted adults" (abstract, last paragraph of intro). I see two potential issues with this claim. First, a minor change: given the approaches used here, no claims should be made about the "function" of these regions, but rather their "functional correlations". Second (and more importantly), the claim that the secondary visual cortex in general resembles blind more than sighted adults is still not fully supported by the data. In fact, this claim is only true for one aspect of secondary visual area functional correlations (i.e., their connectivity to A1/M1/S1 vs. PFC). In other analyses, the infant secondary visual cortex looks more like sighted adults than blind adults (i.e., in within vs. across hemisphere correlations), or shows a different pattern from both sighted and blind adults (i.e., in occipito-frontal subregion functional connectivity). It is not clear from the manuscript why the comparison to PFC vs. non-visual sensory cortex is more theoretically important than hemispheric changes or within-PFC correlations (in fact, if anything, the within-PFC correlations strike me as the most important for understanding the development and reorganization of these secondary visual regions). It seems then that a more accurate conclusion is that the secondary visual cortex shows a mix of instructive effects of vision and reorganizing effects of blindness, albeit to a different extent than the primary visual cortex.

      Relatedly, group differences in overall secondary visual cortex connectivity are particularly striking as visualized in the connectivity matrices shown in Figure S1. In the results (lines 105-112), it is noted that while the infant FC matrix is strongly correlated with both adult groups, the infant group is nonetheless more strongly correlated with the blind than sighted adults. I am concerned that these results might be at least partially explained by distance (i.e., local spread of the bold signal), since a huge portion of the variance in these FC matrices is driven by stronger correlations between regions within the same system (e.g., secondary-secondary visual cortex, frontal-frontal cortex), which are inherently closer together, relative to those between different systems (e.g., visual to frontal cortex). How do results change if only comparisons between secondary visual regions and non-visual regions are included (i.e., just the pairs of regions within the bold black rectangle on the figure), which limits the analysis to long-rang connections only? Indeed, looking at the off-diagonal comparisons, it seems that in fact there are three altogether different patterns here in the three groups. Even if the correlation between the infant pattern and blind adult pattern survives, it might be more accurate to claim that infants are different from both adult groups, suggesting both instructive effects of vision and reorganizing effects of blindness. It might help to show the correlation between each group and itself (across independent sets of subjects) to better contextualize the relative strength of correlations between the groups. 

      It is not clear that differences between groups should be attributed to visual experience only. For example, despite the title of the paper, the authors note elsewhere that cross-modal experience might also drive changes between groups. Another factor, which I do not see discussed, is possible ongoing experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. Although no effects of age are detected, it is possible that cortex is still undergoing experience-independent maturation at this very early stage of development. For example, consider Figure 2; perhaps V1 connectivity is not established at 2 weeks, but eventually achieves the adult pattern later in infancy or childhood. Further, consider the possibility that this same developmental progression would be found in infants and children born blind. In that case, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). To deal with these issues, the authors should add a discussion of the role of maturation vs. experience and temper claims about the role of visual experience specifically (particularly in the title). 

      The authors measure functional correlations in three very different groups of participants and find three different patterns of functional correlations. Although these three groups differ in critical, theoretically interesting ways (i.e., in age and visual/cross-modal experience), they also differ in many uninteresting ways, including at least the following: sampling rate (TR), scan duration, multi-band acceleration, denoising procedures (CompCor vs. ICA), head motion, ROI registration accuracy, and wakefulness (I assume the infants are asleep).

      Addressing all of these issues is beyond the scope of this paper, but I do feel the authors should acknowledge these confounds and discuss the extent to which they are likely (or not) to explain their results. The authors would strengthen their conclusions with analyses directly comparing data quality between groups (e.g., measures of head motion and split-half reliability would be particularly effective).

      Response #1: We appreciate the reviewer’s comments. In response, we have revised the paper to provide a more balanced summary of the data and clarified in the introduction which signatures the paper focuses on and why. Additionally, we have included several control analyses to account for other plausible explanations for the observed group differences. Specifically, we randomly split the infant dataset into two halves and performed split-half cross-validation. Across all comparisons, the results from the two halves were highly similar, suggesting that the effects are robust (see Supplementary Figures S3 and S4).

      Furthermore, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults) and found no significant differences between them (details in response #6). Finally, we repeated our analysis after excluding infants with a radiology score of 4 or 5, and the results remained consistent, indicating that our findings are not confounded by potential brain anomalies (details in response #2).

      We hope these control analyses help strengthen our conclusions.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in sighted infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of sighted infants lies between that of sighted adults (stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of sighted infants resembled those of sighted adults more than those of blind adults, but sighted infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths:

      The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      -Overall, the analyses considered are solid and well-detailed. The results are quite convincing, even if the interpretation might need to be revised downwards, as factors other than visual experience may play a role in the development of functional connections with the visual system.

      Weaknesses:

      While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating when experience-dependent mechanisms are important for the setting- establishment of multiple functional connections within the visual system. This could be achieved by analyzing different developmental periods in the same way, using open databases such as the Baby Connectome Project. Given the early, "condensed" maturation of the visual system after birth, we might expect sighted infants to show connectivity patterns similar to those of adults a few months after birth.

      The rationale for mixing full-term neonates and preterm infants (scanned at term-equivalent age) from the dHCP 3rd release is not understandable since preterms might have a very different development related to prematurity and to post-natal (including visual) experience. Although the authors show that the difference between the connectivity of visual and other sensory regions, and the one of visual and PFC regions, do not depend on age at birth, they do not show that each connectivity pattern is not influenced by prematurity. Simply not considering the preterm infants would have made the analysis much more robust, and the full-term group in itself is already quite large compared with the two adult groups. The current study setting and the analyses performed do not seem to be an adequate and sufficient model to ascertain that "a few weeks of vision after birth is ... insufficient to influence connectivity".

      In a similar way, excluding the few infants with detected brain anomalies (radiological scores higher or equal to 4) would strengthen the group homogeneity by focusing on infants supposed to have a rather typical neurodevelopment. The authors quote all infants as "sighted" but this is not guaranteed as no follow-up is provided.

      Response #2: We appreciate the reviewer’s suggestion. We re-analyzed the infant cohort after excluding all cases with radiological scores ≥4 (n =39 infants excluded). The revised analysis confirmed that the connectivity patterns reported in the main text remain statistically unchanged (see Supplementary Fig. S11). This demonstrates the robustness of our findings to potential confounding effects from potential brain anomalies. We have explicitly clarified this in the revised Methods section (page 14, line 391in the manuscript).

      In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      The post-menstrual age (PMA) at scan of the infants is also not described. The methods indicate that all were scanned at "term-equivalent age" but does this mean that there is some PMA variability between 37 and 41 weeks? Connectivity measures might be influenced by such inter-individual variability in PMA, and this could be evaluated.

      The rationale for presenting results on the connectivity of secondary visual cortices before one of the primary cortices (V1) was not clear to understand. Also, it might be relevant to better justify why only the connectivity of visual regions to non-visual sensory regions (S1-M1, A1) and prefrontal cortex (PFC) was considered in the analyses, and not the ones to other brain regions.

      In relation to the question explored, it might be informative to reposition the study in relation to what others have shown about the developmental chronology of structural and functional long-distance and short-distance connections during pregnancy and the first postnatal months.

      The authors acknowledge the methodological difficulties in defining regions of interest (ROIs) in infants in a similar way as adults. The reliability and the comparability of the ROIs positioning in infants is definitely an issue. Given that brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing delayed growth), the newborn brain is not homothetic to the adult brain, which poses major problems for registration. The functional specialization of cortical regions is incomplete at birth. This raises the question of whether the findings of this study would be stable/robust if slightly larger or displaced regions had been considered, to cover with greater certainty the same areas as those considered in adults. And have other cortical parcellation approaches been considered to assess the ROIs robustness (e.g. MCRIB-S for full-terms)?

      Recommendations for the Authors:

      Reviewer #1(Recommendations for the authors):

      Further consideration should be given to the underlying changes in network architecture that may account for differences in functional correlations across groups. An increase (or decrease) in correlation between two regions could signify an increase (decrease) in connection or communication between those regions. Alternatively, it might reflect an increase in communication or connection with a third region, while the physical connections/interactions between the two original regions remain unchanged. These possibilities lead to distinct mechanistic interpretations. For example, there are substantial changes in connectivity during early visual (e.g. Burkhalter A. 1993, Cerebral Cortex) and visuo-motor development (e.g., Csibra et al. 2000 Neuroreport). It's not clear whether increases in communication within the visual network and improvements in visuo-motor behavior (e.g., Yizhar et al. 2023 Frontiers in Neuroscience) wouldn't produce a qualitatively similar pattern of results.

      Relatedly, the within-network correlation patterns between visual ROIs and frontal ROIs appear markedly different between sighted adults and infants (Supplementary Figure S1). To what extent do the differences in long-range correlations between visual and frontal regions reflect these within-network differences in functional organization?

      Response #3: The reviewer is raising some interesting questions about possible mechanisms and network changes. Resting state studies are indeed always subject to possibility that some effects are mediated by a third, unobserved region. Prior whole-cortex connectivity analyses have observed primarily changes in occipito-frontal connectivity in blindness, so there is not a clear cortical ‘third region’ candidate (Deen et al., 2015). However, some thalamic affects have also been observed and could contribute to the phenomenon (Bedny et al., 2011). Resting state changes in correlation between two areas do not imply changes in strength of long-range anatomical connectivity. Indeed, in the current case they may well reflect differential functional coupling, rather than strengthening or weakening of anatomical connections. We now discuss this in the Discussion section on page 12, line 301 as follows:

      “Despite these insights, many questions remain regarding the neurobiological mechanisms underlying experience-based functional connectivity changes and their relationship to anatomical development. Long-range anatomical connections between brain regions are already present in infants—even prenatally—though they remain immature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017). Functional connectivity changes may stem from local synaptic modifications within these stable structural pathways, consistent with findings that functional connectivity can vary independently of structural connection strength (Fotiadis et al., 2024). Moreover, functional connectivity has been shown to outperform structural connectivity in predicting individual behavioral differences, suggesting that experience-based functional changes may reflect finer-scale synaptic or network-level modulations not captured by macrostructural measures (Ooi et al., 2022). Prior studies also suggest that, even in adults, coordinated sensory-motor experience can lead to enhancement of functional connectivity across sensory-motor systems, indicating that large-scale changes in functional connectivity do not necessarily require corresponding changes in anatomical connectivity (Guerra-Carrillo et al., 2014; Li et al., 2018).”

      It is not clear how changes in correlation patterns among visual areas would produce the connectivity between visual areas and prefrontal areas reported in the current study. Activity in visual areas drives correlations both among visual areas and between visual and prefrontal areas and the same is true of prefrontal corticies.

      The findings from this study should be more closely linked to the extensive literature surrounding the debate on whether experience plays an instructive or permissive role in visual development (e.g., Crair 1999 Current Opin Neurobiol; Sur et al. 1999 J Neurobiol; Kiorpes 2016 J Neurosci; Stellwagen & Shatz 2002 Neuron; Roy et al. 2020 Nature Communications).

      Response #4: The instructive role suggests that specific experiences or patterns of neural activity directly shape and organize neural circuitry, while the permissive role indicates that such experiences or activity merely enable other factors, such as molecular signals, to influence neural circuit formation(Crair, 1999; Sur et al., 1999). To distinguish whether experience plays an instructive or permissive role, it is essential to manipulate the pattern or information content of neural activity while maintaining a constant overall activity level (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002). However, both the sighted and blind adult groups have had extensive experience and neural activity in the visual cortices. For the sighted group, activity in the visual cortex is partly driven by bottom-up input from the external environment, through the retina, LGN, and ultimately to the cortex. In contrast, the blind group’s visual cortex activity is partially driven by top-down input from non-visual networks. The precise role of this activity in shaping the observed connectivity patterns remains unclear. Although our study cannot speak to this issue directly, we now link to the relevant literature on page 12,line 320 of the manuscript in the Discussion section as follows:

      “The current findings reveal both effects of vision and effects of blindness on the functional connectivity patterns of the visual cortex. A further open question is whether visual experience plays an instructive or permissive role in shaping neural connectivity patterns. An instructive role suggests that specific sensory experiences or patterns of neural activity directly shape and organize neural circuitry. In contrast, a permissive role implies that sensory experience or neural activity merely facilitates the influence of other factors—such as molecular signals—on the formation and organization of neural circuits (Crair, 1999; Sur et al., 1999). Studies with animals that manipulate the pattern or informational content of neural activity while keeping overall activity levels constant could distinguish between these hypotheses (Crair, 1999; Roy et al., 2020; Stellwagen & Shatz, 2002).”

      The assertion that a few weeks of vision after birth is insufficient to influence connectivity is provocative. Though supported by the study's results, it would benefit from integration with research in animal models showing considerable malleability of networks from early experience (e.g., Akerman et al. 2002 Neuron; Li et al. 2006 Nature Neuroscience; Stacy et al. 2023 J Neuroscience).

      Response #5: We thank the reviewer for their suggestion. The present study found that several weeks of postnatal visual experience is insufficient to significantly alter the long-term connectivity patterns of the visual cortices. While animal studies have shown that acute visual experience, or even exposure to visual stimuli through unopened eyelids, can robustly influence visual system development(Akerman et al., 2002; Li et al., 2008; Van Hooser et al., 2012). We think this discrepancy may be attributed to the substantial differences in developmental timelines between species. The human lifespan is much longer, and so is the human critical period, making it unclear how to map duration from one species to another. We briefly touched upon the time course issue in page 11 line 289 in the Discussion section as follows:

      “The present results reveal the effects of experience on development of functional connectivity between infancy and adulthood, but do not speak to the precise time course of these effects. Infants in the current sample had between 0 and 20 weeks of visual experience. Comparisons across these infants suggests that several weeks of postnatal visual experience is insufficient to produce a sighted-adult connectivity profile. The time course of development could be anywhere between a few months and years and could be tested by examining data from children of different ages.”

      Substantial differences between the groups are evident in several key aspects of the study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To clarify how these differences might have impacted correlation differences between groups, it would be essential to include information on the noise ceilings for each correlation analysis within each group.

      Response #6: We thank the reviewer for their suggestion. We now report the split-half noise ceiling for adult and infant groups. For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056,blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA<sub>,</sub> F(2,552) = 2.348, p = 0.097). Therefore, we believe that overall signal quality is unlikely to impact our results. We also add the relevant context in the Method section in page 16 Line 447 as follows:

      “Substantial differences between the groups exist in this study, including the number of subjects, brain sizes, imaging parameters, and data preprocessing, all of which are likely to have an impact on the overall signal quality. To address this concern, we compared the split-half noise ceiling across the groups (infants, sighted adults, and blind adults). For each participant, we first split the rs-fMRI time series into two halves, then calculated the ROI-wise rsFC pattern from the two splits. The split-half noise ceiling was estimated according to Lage-Castellanos et al (Lage-Castellanos et al., 2019). The noise ceilings of the three groups (infants: 0.90 ± 0.056, blind adults: 0.88 ± 0.041, sighted adults: 0.90 ± 0.055) showed no significant difference (One-way ANOVA, F (2,552) = 2.348, p = 0.097). Therefore, overall signal quality is unlikely to impact our results.”

      In general, it appears that the infant correlations are stronger compared to the other groups. While this could reflect increased coherence or lack of differentiation, it is also possible that it is simply due to the presence of a non-neuronal global signal. Such a signal has the potential to substantially limit the effective range of functional correlations and comparisons with adults. To address this, it is advisable to conduct control analyses aimed at assessing and potentially removing global signals.

      Response #7: We agree with the reviewer that global signal regression (GSR) may help reduce non-neuronal artifacts, such as motion, cardiac, and respiratory signals, which are known to correlate with the global signal. However, the global signal also contains neural signals from gray matter, and removing it can introduce unwanted artifacts, especially for the current study. First, GSR can reduce the physiological accuracy of functional connectivity (FC); second, GSR may have differential effects across groups, potentially introducing additional artifacts in between-group comparisons, as noted by Murphy et al (Murphy & Fox, 2017). The CompCor method (Behzadi et al., 2007; Whitfield-Gabrieli & Nieto-Castanon, 2012) is capble to estimate the global non-neuronal artifacts like the GSR method. Meanwhile as it estimate global non-neuronal artifacts from signals within the white matter (WM) and cerebrospinal fluid (CSF) masks, but not the gray matter (GM), CompCor could introduce minimal unwanted bias to the GM signal.

      Was there a difference in correlations for preterm vs term neonates? Recent research has suggested that preterm births can have an impact on functional networks, particularly in frontal cortices. e.g., Tokariev et al. 2019, Li et al. 2021 elife; Zhang et al. 2022 Fronteirs in Neuroscience.

      Response #8: We have compared preterm and term neonates for all the main results, including the connectivity from the secondary visual cortex/V1 to non-visual sensory cortices versus prefrontal cortices, the laterality of occipito-frontal connectivity, and the specialization across different fronto-occipital networks. This information is reported in Page 6 line 169 and Supplementary Figure S7. The connectivities of full-term infants are generally higher than those of preterm infants. However, the connectivity patterns of term and preterm infants are very similar.

      The consistency between the current results and prior work (e.g., Burton et al. 2014) is notable, particularly in the observed greater correlations in prefrontal regions and weaker correlations in somato-motor regions for early blind individuals compared to sighted. However, almost all visual-frontal correlations in both groups were negative in that prior study. Some discussion on why positive correlations were found in the current study could help to clarify.

      Response #9: Many other papers have reported positive correlations similar to those found in our study (e.g., Deen et al., 2015; Kanjlia et al., 2021). In contrast, Burton's study identified predominantly negative visual-frontal correlations, we think this is likely because the global signal was regressed out during preprocessing. This methodological choice can lead to an increase in negative connections (Murphy & Fox, 2017).

      The term "secondary visual areas" used throughout the paper lacks specificity, and its usage in terms of underlying anatomical and functional areas has been inconsistent in the literature. It would be advisable to adopt a more precise characterization based on functional and/or anatomical criteria.

      Response #10: We specified in the article that Tthe occipital ROIs were defined in the current study are functional areas in people born blind identified in prior studies as regions that respond to three non-visual tasks such as language, math, or executive function, and show functional connectivity changes in blind adults in previous studies (Kanjlia et al., 2016, 2021; Lane et al., 2015). These regions respond to language, math and executivie function in the congenitally blind population (see Figure 1.) The are refered collectively as ‘secondary visual areas’ to destinguish them from V1. Anatomically, these three regions cover the majority of the lateral occipital cortex and part of the ventral occipital cortex, providing a good sample of the connectivity profile of higher-order visual areas. Thus, we are using the term "secondary visual areas" to refer to these regions. In blind individuals, although these regions respond to non-visual tasks, their exact functions are unknown.

      The inclusion of the ventral temporal cortex in the visual ROIs is currently only depicted in Supplementary Figure S7. To enhance the clarity of the areas of interest analyzed, it would be advisable to illustrate the ventral temporal areas in the main text. Were there notable differences in the frontal correlations between the lateral occipital visual areas and ventral temporal areas?

      Response #11: We thank the reviewer for pointing out this issue. We added a statement about the ventral visual cortex in describing the location of the ROI and added the ventral view of ROIs in the Figure 1. The language-responsive and math -responsive ROIs covers both the lateral and ventral visual cortex, whereas executive function (response-conflict) regions cover only the lateral visual cortex. We compared the connectivity patterns of these three regions and found no differences (see supplementary Fig S2).

      The blind group results are characterized as reflecting a reorganization in comparison to sighted adults while the results for sighted adults compared to infants are discussed more as a maturation ("adult pattern isn't default but requires experience to establish"). Both the sighted and blind adult groups showed differences from the infant group, and these differences are attributed to the role of experience. Why use "reorganization" for one result and maturation for another?

      Response #12: We agree with the reviewer that both of the adult groups should be thought of as equal in relation to the infants. In other words, the brain develops under one set of experiential conditions or another. We do not think that the adult sighted pattern reflects maturation. Rather, the sighted adult pattern reflects the combined influence of maturation and visual experience. The adult blind pattern reflects the combined influence of maturation and blindness. We use the term ‘reorganization’ to label differences in the blind adults relative to sighted infants. We do so for the purpose of clarity and to remain consistent with terminology in prior liaterature. However, we agree with the reviewer that the blind group does not reflect ‘reorganization’ intrinsically any more than the sighted adult group.

      The statement that "visual experience is required to set up long-range functional connectivity" is unclear, especially since the infant and blind groups showed stronger long-range functional correlations with PFC.

      Response #13: We revised this sentence to specifically as “visual experience establishes elements of the sighted-adult long-range connectivity” in tha Abstract line 17.

      The statement that the visual ROIS roughly correspond to "the anatomical location of areas such as V5/MT+, LO, V3a, and V4v" appears imprecise. From Supplementary Figure S7, these areas cover anterior portions of ventral temporal cortex (do these span the anatomical location of putative category-selective areas?) and into the intraparietal sulcus.

      Response #14: Thanks to the reviewer for the clarification. The ventral ROIs cover the middle and part of the anterior portion of the ventral temporal lobe, including the putative category-selective areas. Additionally, the dorsal ROIs extend beyond the occipital lobe to the intraparietal sulcus and superior parietal lobule. We have added a more detailed description of the anatomical location of the ROI in the Methods section Page 17 line 489 as follows:

      “Each functional ROI spans multiple anatomical regions and together the secondary visual ROIs tile large portions of lateral occipital, occipito-temporal, dorsal occipital and occipito-parietal cortices. In sighted people, the secondary visual occipital ROIs include the anatomical locations of functional regions such as motion area V5/MT+, the lateral occipital complex (LO), category specific ventral occipitotemporal cortices and dorsally, V3a and V4v.  The occipital ROI also covers the middle of the ventral temporal lobe. Dorsally, it extended to the intraparietal sulcus and superior parietal lobule.”

      The motivation for assessing correlations with motor and frontal regions was briefly discussed in the introduction. It would be helpful to reiterate this motivation when first introducing the analyses in the results.

      Response #15: Thank you for the thoughtful suggestion. Upon reflection, we chose to substantially revise the Introduction to more clearly and comprehensively explain the rationale for examining the couplings with motor and frontal regions, rather than reiterating it in the Results section. We believe this revised framing provides a stronger foundation for the analyses that follow, while avoiding redundancy across sections. We hope this addresses the reviewer’s concern.

      Reviewer #2 (Recommendations for the authors):

      Congratulations on a well-written paper and an interesting set of results.

      Reviewer #3 (Recommendations for the authors):

      Abstract:

      Mentioning "sighted infants" does not seem adequate.

      Response #16: In our dataset, newborns (average age at scan = 2.79 weeks) have very limited and immature vision. We agree with the reviewer that long-term visual outcomes cannot be guaranteed without follow-up data. The term "sighted infants" was used operationally to distinguish this cohort from congenitally blind populations.

      In sentences after "Specifically...", it was not clear whether the authors referred to V1 connectivity.

      Response #17: We thank the reviewer for this comment. In the revised abstract, we have removed the original "Specifically..." phrasing and clarified the results.

      Introduction

      Talking about the "instructive effects" of vision might be confusing or misleading. Visual experiences like exposure to oral language are part of the normal/spontaneous environment that allows the infant behavioral acquisitions (contrarily with learnings that occur later during development with instruction like for reading).

      Response #18: We appreciate the reviewer’s concern and would like to clarify that the term “instructive effect” is used here derived from neurodevelopmental studies (Crair, 1999; Sur et al., 1999). In this context, “instructive” refers to activity-dependent mechanisms where patterns of neural activity actively guide the organization of synaptic connectivity, emphasizing that spontaneous or sensory-driven activity (e.g., retinal waves, visual experience) can directly shape circuit refinement, as seen in ocular dominance column formation. In the context of our study, we emphasize that vision plays an instructive role in setting up the balance of connectivity between occipital cortex and non-visual networks.

      For references on the development of connectivity, I would advise citing MRI studies but also studies based on histological approaches (see for example the detailed review by Kostovic et al, NeuroImage 2019).

      Response #19: We thank the reviewer for this suggestion. We have incorporated a discussion on the long-range anatomical connections that emerge as early as infancy, referencing studies that employed diffusion MR imaging and histological methods, as detailed below.

      “Many long-range anatomical connections between brain regions are already established in infants, even before birth, although they are not yet mature (Huang et al., 2009; Kostović et al., 2019, 2021; Takahashi et al., 2012; Vasung, 2017).” (Page 12, line 303 in the manuscript)

      Results

      P7 l170: It might be helpful to be precise that this is "compared with inter-hemispheric connectivity".

      Response #20: We thank the reviewer for this suggestion. To align with our established terminology, we have revised the statement to explicitly contrast within-hemisphere connectivity with between-hemisphere connectivity. The modified text now reads (page 7, line 183 in the manuscript):

      “Compared to sighted adults, blind adults exhibited a stronger dominance of within-hemisphere connectivity over between-hemisphere connectivity. That is, in people born blind, left visual networks are more strongly connected to left PFC, whereas right visual networks are more strongly connected to right PFC.

      L176-181: It was not clear to me what was the difference between "across" and "between hemisphere connectivity". Would it be informative to test the difference between blind and sighted adults?

      Response #21: We clarify that there is no distinction between the terms “across” and “between hemisphere connectivity”—they refer to the same concept. To ensure consistency, we have revised the text to exclusively use “between hemisphere connectivity” throughout the manuscript. Regarding the comparison between blind and sighted adults, we conducted statistical comparisons between these groups in our analysis, and the results have been incorporated into the revised version (Page 7, line 187 in the manuscript).

      Adding statistics on Figure 3, but also on Figures 1 and 2 might help the reading.

      Response #22: We have added the statistics in Figure 1-4.

      Adding the third comparison in Figure 4 would be possible in my view.

      Response #23: We explored integrating the response-conflict region into Figure 4, but this would require a 3x3 bar chart with pairwise statistical significance markers, which introduced excessive visual complexity that hindered readers’ ability to grasp our intended message. To ensure clarity, we retained the original Figure 4 while providing the complete three-region analysis (including all statistical comparisons) in Supplementary Figure S8 to ensure completeness.

      Methods

      The authors might have to specify ages at birth, and ages at scan (median + range?).

      Response #24: We have added that information in the Methods section as follows:

      “The average age from birth at scan = 2.79 weeks (SD = 3.77, median = 1.57, range = 0 – 19.71); average gestational age at scan = 41.23 weeks (SD = 1.77, median = 41.29, range = 37 – 45.14); average gestational age at birth = 38.43 weeks (SD = 3.73, median = 39.71, range = 23 – 42.71).” (Page 14, line 379 in the manuscript)

      It might be relevant to comment on the range of available fMRI volumes, and the fact that connectivity measures might then be less robust in infants.

      Response #25: We report the range of fMRI volumes in the Methods section (Page 16, Line 449). Adult participants (blind and sighted) underwent 1–4 scanning sessions, each containing 240 volumes (mean scan duration: 710.4 seconds per participant). For infants, all subjects had 2300 fMRI volumes, and we retained a subset of 1600 continuous volumes per subject with the minimum number of motion outliers. While infant connectivity measures may inherently exhibit lower robustness due to developmental and motion-related factors, our infant cohort’s large sample size (n=475) and stringent motion censoring criteria enhance the reliability of group-level inferences. We have integrated this clarification into the Methods section (Page 16, Line 444) as follows:

      "While infant connectivity estimates may be less robust at the individual level compared to adults due to shorter scan durations and higher motion, our cohort’s large sample size (n=475) and rigorous motion censoring mitigate these limitations for group-level analyses. "

      The mention of dHCP 2nd release should be removed from the paragraph on data availability.

      Response #26: We have removed it.

    1. “So Culain loosed the dog, and with one spring it bounded forth out of the court of the house and over the wall of the rath, making a circuit of the entire district; and when it came back panting, with its tongue hanging from its jaws, it took up its usual position in front of the house, and there crouched with its head upon its paws, watching the high road to Emain. Surely an extraordinarily cruel and fierce and savage dog was he.

      I have the speculation that the boy will fight the dog.

    1. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

    2. Author response:

      Reviewer #1 (Public review): 

      Summary: 

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.  

      We appreciate the reviewer’s statement highlighting the importance of our study. 

      Strengths: 

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism. 

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses: 

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure). 

      We thank the reviewer for the helpful comments. We understand that the analyses were difficult to follow, and we will work on the clarity of the Results section. However, we would like to emphasize that every d′ measure is accompanied by analyses of response rates (i.e., correct and incorrect choice rates). In addition, we applied standard psychometric analyses whenever possible. Specifically, psychometric functions were fitted to the data using logistic regression. We will rework the text to clarify these points.

      During training, only two stimulus amplitudes were presented, which precluded the construction of psychometric curves. For the categorization task, however, psychometric analyses were feasible and conducted (Figure 2). These analyses revealed no evidence of categorization bias (as measured by threshold) or accuracy (as measured by the slope) across stimulus strengths.

      The calculation of d’ is included in the Methods, but we will also report and explain its use in each part of the Results section where it has been included.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task. 

      Strengths: 

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands. 

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses: 

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative. 

      We thank the reviewer for the careful reading of our manuscript and for the constructive feedback. The reviewer raises a valid point. We agree that our study is primarily descriptive and focused on behavioral data, and we appreciate the opportunity to clarify the scope and interpretation of our findings. Our primary goal was to characterize behavioral patterns during tactile discrimination and categorization, and the psychometric analyses were intended to provide a detailed description of these patterns. We do not claim to provide direct neural, causal, or computational evidence. 

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered. 

      Alternative explanations of our findings, such as differences in motivation, fatigue, satiety, stereotyped licking, and reward valuation have indeed been considered. We will revise the manuscript to present these points more clearly. 

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. We do not claim to have tested the Load Theory; rather, inspired by it, we assessed behavioral patterns in our tactile categorization task. We agree that referring to the Adaptive Resonance Theory, which is based on artificial neural network models, might be misleading since we focus on behavioral results, and we will revise the text accordingly. However, our task allowed us to examine the impact of categorization on discrimination, confirming that Fmr1<sup>-/y</sup>ation can amplify perceptual differences between stimuli belonging to different categories and reduce perceived differences within a category in WT mice but not in the mice when low-salience stimuli were experienced. Finally, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced use of categories in low-salience tactile discrimination. 

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations. 

      We agree with the reviewer that our current experiments are behavioral in nature and do not provide direct mechanistic evidence for top-down pathway dysfunction. Our goal was to carefully characterize tactile responses and behavioral patterns in Fmr1<sup>-/y</sup> mice. The notion of “top-down” is used at the behavioral level, referring to the influence of higher-level cognitive processes (e.g., categorization, attention) on perception, rather than to underlying neural circuits. We will revise the manuscript to more clearly emphasize that our conclusions are based on behavioral observations, and we will frame mechanistic inferences as hypotheses rather than established findings. We will also explicitly note that future work using neural recordings or causal manipulations will be required to directly test these hypotheses.

      We also note that identifying the precise top-down circuits involved will require extensive additional experimentation. For example, one would first need to pinpoint the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself. After such a circuit is identified, further work would then be needed to rescue or manipulate this pathway in the Fmr1<sup>-/y</sup> model. These steps represent a substantial program of mechanistic research that, while important, goes well beyond the scope of the present study.

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited. 

      We recognize that “reduced top-down categorization influence” and “choice consistency bias” are based on behavioral observations. However, we respectfully disagree that this makes these constructs inherently speculative. Similar behavioral inferences have been applied in previous clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021). The translational impact of our work lies in the highly translational platform we have developed – and in highlighting the complexity of tactile measures and additional analyses that can be conducted in clinical studies.

      We agree with the reviewer that the neural-based experiments would indeed provide valuable mechanistic insight into our observed behavioral alterations, and we believe future studies should therefore focus on their underlying neurobiological substrate.

      We will revise the language throughout the manuscript to clarify that all conclusions are based on behavioral measures.  

      (3) Statistical analysis: 

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on nonsignificant findings undermines confidence in the conclusions.  

      Several trends are evident in complex measures, such as d’ analyses on task sensitivity or responses pooled across different amplitudes. Additional analyses revealed which component of these measures showed a statistically significant difference across genotypes, namely the low-salience incorrect choices accounting for low task sensitivity. We chose to present all analyses to be transparent and to highlight that commonly used complex measures (like d’ analyses) may mask important findings. In the text, we described p-values between 0.05 and 0.1 as observed trends without over-interpreting their significance. 

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations. 

      The number of mice used in each genotype group is consistent with standard practices in behavioral studies using mice and sensory tasks. We have performed effect size measures (e.g., Cohen’s d) alongside some of the statistical comparisons, showing a medium effect size (>0.5). 

      As the reviewer correctly noted, no mice were excluded based on outlier analyses, since the observed variability reflects true biological differences rather than experimental or technical errors. We will reexamine our dataset for potential outliers. If any are identified, we will perform analyses both with and without the outlier and report any effects that are sensitive to single animals. These procedures and results will be explicitly described in the Methods and Results sections.

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.  

      We thank the reviewer for raising this important point and we will include a clear statement on multiple comparisons in the Methods section. 

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as ttests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test. 

      We thank the reviewer for raising this point. This was not done intentionally. A repeated-measures ANOVA on miss rates for low-salience stimuli during categorization confirmed that there are statistically significant differences both across stimulus amplitudes and between genotypes. Additional correction for multiple comparisons will be performed and explained in the Methods section.  

      (4) Emphasis on theoretical models: The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed. 

      As mentioned above, our goal was not to directly test these theories but rather to apply them within our translational framework. The Discussion section will be reframed to highlight that our findings are consistent with predictions from certain cognitive theories rather than implying that these frameworks were directly tested.

      Reviewer #3 (Public review): 

      Summary: 

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice 

      Strengths: 

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD. 

      We appreciate the reviewer’s positive assessment of our study’s translational value and the importance of our behavioral findings.

      Weaknesses: 

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.  

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, could provide additional insights into learning dynamics. This analysis will be conducted and added into the revised manuscript.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While it is an interesting and important question based on previous findings in preclinical and clinical studies, it falls outside the scope of the current manuscript.    

      Feigin H, Shalom-Sperber S, Zachor DA, Zaidel A (2021) Increased influence of prior choices on perceptual decisions in autism. Elife 10.

      Soulières I, Mottron L, Saumier D, Larochelle S (2007) At ypical categorical perception in autism: Autonomy of discrimination? J Autism Dev Disord 37:481–490.

    1. Our students have helped us create lists of words that come to mind using this exercise. Within a few minutes, a class frequently generates 30 or 40 words that Americans associate with Africa. Native, hut, warrior, shield, tribe, terrorist, savage, cannibals, jungle, pygmy, barbarian, pagan, voodoo, and witch doctor are commonly associated with “traditional” Africa.

      This part of the text shows that many Americans have Africa as a subconscious idea in their head. This is mainly because of the things they've been told through media but also because of school not exactly teaching them completely. I believe that if schools would talk about the development of Africa, people wouldn't categorize Africa with the words used in the text. This is an example of many different popular shortcomings made about Africa. The only reason these terms are associated with Africa still is because people in America haven't took the time to learn about actually understanding Africa so how can it be taught? That's why it still persists even today.

    1. Romeo slew him, he slew Mercutio; 1700Who now the price of his dear blood doth owe?

      This is the Prince's reply to Lady Capulet who wants to see the Prince take Romeo's head for the death of Tybalt. I believe that the prince is saying who left is owed the blood of Romeo if Mercutio was slain by Tybalt, and Tybalt slain by Romeo. This shows us the kind of mercy the prince is willing to grant even in this kind of circumstance even after warning the two families.

    1. Reviewer #1 (Public review):

      Summary

      Xu et al. use transcriptomic comparisons of mouse cochlear and vestibular hair to show that the vestibular hair cells alone are enriched in gene expression for proteins necessary for cilia motility and to further argue that such motility is a normal function of the kinocilia.

      Background:

      Cilia are prominent in sensory receptors, including vertebrate photoreceptors, olfactory neurons, and mechanosensitive hair cells of the inner ear and lateral line. Cilia can be motile or nonmotile depending on their axonemal structure: motile cilia require dynein and the inner 2 singlet microtubules of the 9+2 array. Primary cilia, present early in development, are considered to have sensory functions and to be nonmotile (Mill et al., Nature Rev Gen 2023).

      In hair cells, the kinocilium anchors and polarizes the mechanosensitive hair bundle of specialized microvilli. The kinocilium matures from the primary cilium of a newborn hair cell; behind it, the bundle of mechanosensory microvilli rises in a descending staircase of rows. During maturation of the mammalian cochlea, all hair cells lose the kinocilium, though not the associated basal body. The consensus for many years has been that most vertebrate kinocilia, and especially mammalian kinocilia, are nonmotile, based largely on the lack of spontaneous motility in excised mammalian vestibular organs, but also on the impression that the rare examples of spontaneous beating motility even in non-mammalian hair cells are associated with deterioration of the preparation (Rüsch & Thurm 1990).

      Strengths

      In comparing RNA expression across the 4 major types of mouse hair cells - 2 cochlear and 2 vestibular - Xu et al. noted that some ciliary genes related to motility are expressed by vestibular but not cochlear hair cells. They curated the ciliary genes into types known to be associated with different aspects of beating motility, and also investigated the expression of genes typical of primary cilia, which are considered to have sensory and cell signaling functions and to be nonmotile. They add immunostaining to back up some of the RNA data, and also evaluate relative expression by neonatal mouse cochlear and vestibular hair cells from a published dataset. The focus on kinociliary genes is an appropriate use of the comparative expression data for cochlear and vestibular hair cells, and the paper overall is readable and interesting. The transcriptome data are rounded off by comparing the authors' results in adult hair cells with published neonatal mouse cochlear and vestibular transcriptomes.

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

    2. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Data:

      a) The main weakness in the data is the lack of functional and anatomical data from mouse hair bundles. While the authors compensate in part for this difficulty with bullfrog crista bundles, those data are also fragmentary - one TEM and 2 exemplar videos. Much of the novelty of the EM depends on the different appearance of stretches of a single kinocilium - can we be sure of the absence of the central microtubule singlets at the ends?

      Our single-cell RNA-seq findings show that genes related to motile cilia are specifically expressed in vestibular hair cells. This has not been demonstrated before. We have also provided supporting evidence using electrophysiology and imaging from bullfrogs and mice. Although no ultrastructural images of mouse vestibular kinocilia were provided in our study, transmission electron micrograph of mouse vestibular kinocilia has been published (O’Donnell and Zheng, 2022). The mouse vestibular kinocilia have a “9+2” microtubule configuration with nine doublet microtubules surrounding two central singlet microtubules. This finding contrasts with a previous study, which demonstrated that the vestibular kinocilia from guinea pigs lack central singlet microtubules and inner dynein arms, whereas outer dynein arms and radial spokes are present (Kikuchi et al., 1989). The central pair of microtubules is absent at the end of the bullfrog saccular kinocilium (Fig. 7A).  We would like to point out that the dual identity of primary and motile cilia is not just based on the TEM images. The kinocilium has long been considered a specialized cilium, and its role as a primary cilium during development has been demonstrated before (Moon et al., 2020; Shi et al., 2022).  

      In most motile cilia, the central pair complex (CPC) does not originate directly from the basal body; instead, it begins a short distance above the transition zone, a feature that already illustrates variation in CPC assembly across systems (Lechtreck et al., 2013). The CPC can also show variation in its spatial extent: for example, in mammalian sperm axonemes, it can terminate before reaching the distal end of the axoneme (Fawcett and Ito, 1965). In addition, CPC orientation differs across organisms: in metazoans and Trypanosoma, the CPC is fixed relative to the outer doublets, whereas in Chlamydomonas and ciliates it twists within the axoneme (Lechtreck et al., 2013). Such variation has been described in multiple motile cilia and flagella and is therefore not unique to vestibular kinocilia. What appears more unusual in our data is the organization at the distal tip, where a distinct distal head is present, similar to cilia tip morphologies recently described in human islet cells (Polino et al., 2023). Although this feature is intriguing, we interpret it primarily as a structural signature rather than as evidence for a specialized motile adaptation, and we will moderate our interpretation accordingly in the revision.

      b) While it was a good idea to compare ciliary motility expression in published P2 datasets for mouse cochlear and vestibular hair cells for comparison with the authors' adult hair cell data, the presentation is too superficial to assess (Figure 6C-E; text from line 336) - it is hard to see the basis for concluding that motility genes are specifically lower in P2 cochlear hair cells than vestibular hair cells. Visually, it is striking that CHCs have much darker bands for about 10 motility-related genes.

      We aimed to show that kinocilia in neonatal cochlear and vestibular hair cells are largely similar, except that neonatal cochlear hair cells lack key genes and proteins required for the motile apparatus. While these genes (e.g., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) appear more highly expressed in P2 cochlear hair cells, they are not uniquely associated with the axoneme. For example, Dynll1/2 and Dynlrb1 are components of the cytoplasmic dynein-1 complex (Pfister et al., 2006), Cetn2 has multiple basic cellular functions beyond cilia (e.g., centrosome organization, DNA repair), and Mdh1 encodes a cytosolic malate dehydrogenase involved in central metabolic pathways such as the citric acid cycle and malate–aspartate shuttle. This contrasts with axonemal dyneins, which are uniquely required for cilia motility. To avoid ambiguity, we will mark such cytoplasmic or multifunctional genes with stars in both Figure 5G and Figure 6D together with legend in the revised manuscript.

      Although those genes (i.e., Dynll1, Dynll2, Dynlrb1, Cetn2, and Mdh1) are highly expressed in neonatal cochlear hair cells, key genes for motile machinery are not detected. For example, Dnah6, Dnah5, and Wdr66 are not expressed in the P2 cochlear hair cells.  Dnah6 and Dnah5 encode axonemal dynein and are part of inner and outer dynein arms while Wdr66 is a component of radial spokes. Importantly, we did not detect the expression of CCDC39 and CCDC40 in kinocilia of P2 cochlear hair cells.  Axonemal CCDC39 and CCDC40 are the molecular rulers that organize the axonemal structure in the 96-nm repeating interactome and are required for the assembly of IDAs and N-DRC for ciliary motility (Becker-Heck et al., 2011; Merveille et al., 2011; Oda et al., 2014). We will modify Figure 6D to highlight the key difference between P2 cochlear and vestibular hair cells in the revised manuscript. We will also revise the text so that the key differences will clearly be described.

      (2) Interpretation:

      The authors take the view that kinociliary motility is likely to be normally present but is rare in their observations because the conditions are not right. But while others have described some (rare) kinociliary motility in fish organs (Rusch & Thurm 1990), they interpreted its occurrence as a sign of pathology. Indeed, in this paper, it is not clear, or even discussed, how kinociliary motility would help with mechanosensitivity in mature hair bundles. Rather, the presence of an autonomous rhythm would actively interfere with generating temporally faithful representations of the head motions that drive vestibular hair cells.

      Spontaneous flagella-like rhythmic beating of kinocilia in vestibular HCs in frogs and eels (Flock et al., 1977; Rüsch and Thurm, 1990) and in zebrafish early otic vesicle (Stooke-Vaughan et al., 2012; Wu et al., 2011) has been reported previously. Based on Rüsch and Thurm (1990), spontaneous kinocilia motility occurred under non-physiological conditions and was interpreted as a sign of cellular deterioration rather than a normal feature. We speculate that deterioration under non-physiological conditions may lead to the disruption of lateral links between the kinocilium and the stereociliary bundle, effectively unloading the kinocilium and allowing it to move more freely. Additionally, fluctuations in intracellular ATP levels may contribute, as ciliary motility is highly ATP-dependent; when ATP is depleted, beating ceases. Similar phenomena have been documented in respiratory epithelia, where ciliary activity can temporarily pause. Nevertheless, the fact that kinocilia can exhibit spontaneous motility under these conditions indicates that they possess the motile machinery necessary for such beating. Irrespective of the condition, cilia without the molecular machinery required for motility will not be able to move.

      We agree with the reviewer that, based on the present data, it is difficult to know the functional role of kinocilia and whether the presence of such autonomous rhythm would interfere with temporal fidelity. Spontaneous bundle motion, driven by the active process associated with mechanotransduction, was observed in bullfrog saccular hair cells (Benser et al., 1996; Martin et al., 2003). We will revise the discussion to clarify this important point of the reviewer. Specifically, we will emphasize that our observations of ciliary beating in the ex vivo conditions may not reflect its properties in the mature in vivo context, but rather a byproduct of motile machinery clearly present in the kinocilia. We speculate that this machinery in mature hair cells could operate in a more subtle mode—modulating the rigor state of dynein arms or related axonemal structures to influence kinociliary mechanics and, in turn, bundle stiffness in response to stimuli or signaling cues. Such a mechanism could either enhance sensitivity or introduce filtering properties, thereby contributing to the fine control of mechanosensory function without compromising temporal fidelity. Future studies using loss-of-function approach will be needed to reveal the unexplored role(s) of kinocilia for vestibular hair cells in vertebrates. 

      Could kinociliary beating play other roles, possibly during development - for example, by interacting with forming accessory structures (but see Whitfield 2020) or by activating mechanosensitivity cell-autonomously, before mature stimulation mechanisms are in place? Then a latent capacity to beat in mature vestibular hair cells might be activated by stressful conditions, as speculated regarding persistent Piezo channels that are normally silent in mature cochlear hair cells but may reappear when TMC channel gating is broken (Beurg and Fettiplace 2017). While these are highly speculative thoughts, there is a need in the paper for more nuanced consideration of whether the observed motility is normal and what good it would do.

      We thank the reviewer for these excellent suggestions. We agree that kinociliary motility could plausibly serve roles during development, for example by guiding hair bundle formation or by contributing to early mechanosensitivity and spontaneous activity before mature stimulation mechanisms are established. It is also possible that the motility machinery represents a latent capacity in mature vestibular hair cells that could be reactivated under stress or pathological conditions. We will revise the Discussion to address these possibilities and to provide a more nuanced consideration of whether the observed motility is normal and what potential functions it might serve.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors compared the transcriptomes of the various types of hair cells contained in the sensory epithelia of the cochlea and vestibular organs of the mouse inner ear. The analysis of their transcriptomic data led to novel insights into the potential function of the kinocilium.

      Strengths:

      The novel findings for the kinocilium gene expression, along with the demonstration that some kinocilia demonstrate rhythmic beating as would be seen for known motile cilia, are fascinating. It is possible that perhaps the kinocilium, known to play a very important role in the orientation of the stereocilia, may have a gene expression pattern that is more like a primary cilium early in development and later in mature hair cells, more like a motile cilium. Since the kinocilium is retained in vestibular hair cells, it makes sense that it is playing a different role in these mature cells than its role in the cochlea.

      Another major strength of this study, which cannot be overstated, is that for the transcriptome analysis, they are using mature mice. To date, there is a lot of data from many labs for embryonic and neonatal hair cells, but very little transcriptomic data on the mature hair cells. They do a nice job in presenting the differences in marker gene expression between the 4 hair cell types. This information is very useful to those labs studying regeneration or generation of hair cells from ES cell cultures. One of the biggest questions these labs confront is what type of hair cells develop in these systems. The more markers available, the better. These data will also allow researchers in the field to compare developing hair cells with mature hair cells to see what genes are only required during development and not in later functioning hair cells.

      We would like to thank reviewer 2 for his/her comments and hope that the datasets provided in this manuscript will be a useful resource for researchers in the auditory and vestibular neuroscience community.

      Joint Recommendations:

      We will make changes in the revision based on the joint recommendations of the two reviewers.

      References

      Becker-Heck, A., Zohn, I.E., Okabe, N., Pollock, A., Lenhart, K.B., Sullivan-Brown, J., McSheene, J., Loges, N.T., Olbrich, H., Haeffner, K., Fliegauf, M., Horvath, J., Reinhardt, R., Nielsen, K.G., Marthin, J.K., Baktai, G., Anderson, K.V., Geisler, R., Niswander, L., Omran, H., Burdine, R.D., 2011. The coiled-coil domain containing protein CCDC40 is essential for motile cilia function and left-right axis formation. Nat Genet 43, 79–84. https://doi.org/10.1038/ng.727

      Benser, M.E., Marquis, R.E., Hudspeth, A.J., 1996. Rapid, Active Hair Bundle Movements in Hair Cells from the Bullfrog’s Sacculus. J. Neurosci. 16, 5629–5643. https://doi.org/10.1523/JNEUROSCI.16-18-05629.1996

      Fawcett, D.W., Ito, S., 1965. The fine structure of bat spermatozoa. American Journal of Anatomy 116, 567–609. https://doi.org/10.1002/aja.1001160306

      Flock, Å., Flock, B., Murray, E., 1977. Studies on the Sensory Hairs of Receptor Cells in the Inner Ear. Acta Oto-Laryngologica 83, 85–91. https://doi.org/10.3109/00016487709128817

      Kikuchi, T., Takasaka, T., Tonosaki, A., Watanabe, H., 1989. Fine structure of guinea pig vestibular kinocilium. Acta Otolaryngol 108, 26–30.https://doi.org/10.3109/00016488909107388

      Lechtreck, K.-F., Gould, T.J., Witman, G.B., 2013. Flagellar central pair assembly in Chlamydomonas reinhardtii. Cilia 2, 15. https://doi.org/10.1186/2046-2530-2-15

      Martin, P., Bozovic, D., Choe, Y., Hudspeth, A.J., 2003. Spontaneous Oscillation by Hair Bundles of the Bullfrog’s Sacculus. J. Neurosci. 23, 4533–4548. https://doi.org/10.1523/JNEUROSCI.23-11-04533.2003

      Merveille, A.-C., Davis, E.E., Becker-Heck, A., Legendre, M., Amirav, I., Bataille, G., Belmont, J., Beydon, N., Billen, F., Clément, A., Clercx, C., Coste, A., Crosbie, R., de Blic, J., Deleuze, S., Duquesnoy, P., Escalier, D., Escudier, E., Fliegauf, M., Horvath, J., Hill, K., Jorissen, M., Just, J., Kispert, A., Lathrop, M., Loges, N.T., Marthin, J.K., Momozawa, Y., Montantin, G., Nielsen, K.G., Olbrich, H., Papon, J.-F., Rayet, I., Roger, G., Schmidts, M., Tenreiro, H., Towbin, J.A., Zelenika, D., Zentgraf, H., Georges, M., Lequarré, A.-S., Katsanis, N., Omran, H., Amselem, S., 2011. CCDC39 is required for assembly of inner dynein arms and the dynein regulatory complex and for normal ciliary motility in humans and dogs. Nat Genet 43, 72–78. https://doi.org/10.1038/ng.726

      Moon, K.-H., Ma, J.-H., Min, H., Koo, H., Kim, H., Ko, H.W., Bok, J., 2020. Dysregulation of sonic hedgehog signaling causes hearing loss in ciliopathy mouse models. eLife 9, e56551. https://doi.org/10.7554/eLife.56551

      Oda, T., Yanagisawa, H., Kamiya, R., Kikkawa, M., 2014. A molecular ruler determines the repeat length in eukaryotic cilia and flagella. Science 346, 857–860. https://doi.org/10.1126/science.1260214

      O’Donnell, J., Zheng, J., 2022. Vestibular Hair Cells Require CAMSAP3, a Microtubule Minus-End Regulator, for Formation of Normal Kinocilia. Front Cell Neurosci 16, 876805. https://doi.org/10.3389/fncel.2022.876805

      Pfister, K.K., Shah, P.R., Hummerich, H., Russ, A., Cotton, J., Annuar, A.A., King, S.M., Fisher, E.M.C., 2006. Genetic Analysis of the Cytoplasmic Dynein Subunit Families. PLOS Genetics 2, e1. https://doi.org/10.1371/journal.pgen.0020001

      Polino, A.J., Sviben, S., Melena, I., Piston, D.W., Hughes, J.W., 2023. Scanning electron microscopy of human islet cilia. Proceedings of the National Academy of Sciences 120, e2302624120. https://doi.org/10.1073/pnas.2302624120

      Rüsch, A., Thurm, U., 1990. Spontaneous and electrically induced movements of ampullary kinocilia and stereovilli. Hearing Research 48, 247–263. https://doi.org/10.1016/0378-5955(90)90065-W

      Shi, H., Wang, H., Zhang, C., Lu, Y., Yao, J., Chen, Z., Xing, G., Wei, Q., Cao, X., 2022. Mutations in OSBPL2 cause hearing loss associated with primary cilia defects via sonic hedgehog signaling [WWW Document]. https://doi.org/10.1172/jci.insight.149626

      Stooke-Vaughan, G.A., Huang, P., Hammond, K.L., Schier, A.F., Whitfield, T.T., 2012. The role of hair cells, cilia and ciliary motility in otolith formation in the zebrafish otic vesicle. Development 139, 1777–1787. https://doi.org/10.1242/dev.079947

      Wu, D., Freund, J.B., Fraser, S.E., Vermot, J., 2011. Mechanistic Basis of Otolith Formation during Teleost Inner Ear Development. Developmental Cell 20, 271–278. https://doi.org/10.1016/j.devcel.2010.12.00

    1. Liberian president Ellen Johnson Sirleaf is the first elected female head of state in Africa, serving from 2006 to 2018.

      a milestone for women in African politics and show progress in gender representation in leadership.

    1. We compared 48 cases of Alphapapillomavirus detection in WGSdata against the current gold standard test of mRNA PCR high-risk/tumorigenic subtypes of HPV. The performance using WGS datawas excellent, with only one sample not matching the gold standard(n = 48; sensitivity = 100%, specificity = 97.3%; Fig. 3A). This sam-ple had high HPV burden as detected by WGS and was likely a false-negative result for the PCR-based test.

      Somehow alphapapillomavirus leads to head and neck cancer

    Annotators

    Annotators

    1. The nature of the game is to run through possible moves in the mind to see how they play out. From this, regular players develop a memory for the patterns the pieces make, the defences and attacks. ‘You recreate it in your mind,’ said Gareyev. ‘A lot of players are capable of doing what I’m doing.’ The real mental challenge comes from playing multiple games at once in the head. Not only must the positions of each piece on every board be memorised, they must be recalled faithfully when needed, updated with each player’s moves, and then reliably stored again, so the brain can move on to the next board. First moves can be tough to remember because they are fairly uninteresting. But the ends of games are taxing too, as exhaustion sets in. When Gareyev is tired, his recall can get patchy.

      player movements

    2. While his challengers will play the games as normal, Gareyev himself will be blindfolded. Even by world record standards, it sets a high bar for human performance. The 28-year-old already stands out in the rarefied world of blindfold chess. He has a fondness for bright clothes and unusual hairstyles, and he gets his kicks from the adventure sport of BASE jumping. He has already proved himself a strong chess player, too. In a 10-hour chess marathon in 2013, Gareyev played 33 games in his head simultaneously

      his competitor 28 year old has a fondness for bright clothes and unusual hairstyles - he gets his kicks from the base jumping and chess player - in 10 hour chess marathon in 2013 -- Gareyev played 33 games