10,000 Matching Annotations
  1. Jan 2025
    1. Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

    2. Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

    3. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study dissects distinct pools of diacylglycerol (DAG), continuing a line of research on the central concept that there is a major lipid metabolism DAG pool in cells, but also a smaller signaling DAG pool. It tests the hypothesis that the second pool is regulated by Dip2, which influences Pkc1 signaling. The group shows that stressed yeast increase specific DAG species C36:0 and 36:1, and propose this promotes Pkc1 activation via Pck1 binding 36:0. The study also examines how perturbing the lipid metabolism DAG pool via various deletions such as lro1, dga1, and pah1 deletion impacts DAG and stress signaling. Overall this is an interesting study that adds new data to how different DAG pools influence cellular signaling.

      Strengths:

      The study nicely combined lipidomic profiling with stress signaling biochemistry and yeast growth assays.

      We thank the reviewer for finding this study of interest and appreciating our multi-pronged approach to prove our hypothesis that a distinct pool of Dip2 regulated by DAGs activate PKC signalling.

      Weaknesses:

      One suggestion to improve the study is to examine the spatial organization of Dip2 within cells, and how this impacts its ability to modulate DAG pools. Dip2 has previously been proposed to function at mitochondria-vacuole contacts (Mondal 2022). Examining how Dip2 localization is impacted when different DAG pools are manipulated such as by deletion Pah1 (also suggested to work at yeast contact sites such as the nucleus-vacuole junction), or with Lro1 or Dga1 deletion would broaden the scope of the study.

      We thank the reviewer for the valuable suggestions regarding the spatial organization of Dip2 in cells under the influence of different DAG pools. As suggested, we will probe the localization of Dip2 in the absence of Pah1. We would also trace the localization of Dip2 in LRO1 and DGA1 deletion where the bulk DAGs are accumulated and present the data in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors use yeast genetics, lipidomic and biochemical approaches to demonstrate the DAG isoforms (36:0 and 36:1) can specifically activate PKC. Further, these DAG isoforms originate from PI and PI(4,5)P2. The authors propose that the Psi1-Plc1-Dip2 functions to maintain a normal level of specific DAG species to modulate PKC signalling.

      Strengths:

      Data from yeast genetics are clear and strong. The concept is potentially interesting and novel.

      We would like to thank the reviewer for the positive comments on our work. We are happy to know that the reviewer finds the study novel and interesting.

      Weaknesses:

      More evidence is needed to support the central hypothesis. The authors may consider the following:

      (1) Figure 2: the authors should show/examine C36:1 DAG. Also, some structural evidence would be highly useful here. What is the structural basis for the assertion that the PKC C1 domain can only be activated by C36:0/1 DAG but not other DAGs? This is a critical conclusion of this work and clear evidence is needed.

      We agree with the reviewer that PKC activated by C36:0 and C36:1 DAGs is a critical conclusion of our work. While we understand that there is no obvious structural explanation as to how the DAG binding C1 domain of PKC attains the acyl chain specificity for DAGs, our conclusion that yeast Pkc1 is selective for C36:0 and C36:1 DAGs is supported by a combination of robust in vitro and in vivo data

      1. In Vitro Evidence: The liposome binding assays demonstrate that the Pkc1 C1 domain only binds the selective DAG and does not interact with bulk DAGs.

      2. In Vivo Evidence: Lipidomic analyses of wild-type cells subjected to cell wall stress reveal increased levels of C36:0 and C36:1 DAGs, while levels of bulk DAGs remain unaffected. This clearly parallels the Dip2 knockout scenario in which the levels of the same set of DAGs go up and Pkc1 gets hyperactivated.

      These findings collectively indicate that Pkc1 neither binds nor is activated by bulk DAGs, reinforcing its specificity for C36:0 and C36:1 DAGs. It is also further corroborated by DGA1 and LRO1 knockouts wherein the increase of the bulk DAGs does not result in a significant increase in Pkc1 signalling.

      Moreover, elucidating the structural basis of this selectivity would require a specific DAG-bound C1 domain structure of Pkc1, which is difficult owing to the flexibility of the longer acyl chains present in C36:0 and C36:1 DAGs. Furthermore, capturing the full-length Pkc1 structure that might provide deeper insights has been challenging for several other groups for a long time. Additionally, we believe that the DAG selectivity by Pkc1 is more of a membrane-associated phenomenon wherein these DAGs might create a specific microdomain or a particular curvature which are required for Pkc1’s ability to bind DAG followed by activation. Investigating this would require extensive structural and biophysical studies, which are beyond the scope of the current work but are planned for future research.

      (2) Does Dip2 colocalize with Plc1 or Pkc1? Does Dip2 reach the plasma membrane upon Plc activation?

      Thank you for your questions regarding the colocalization and potential translocation of Dip2 upon Plc1 or Pkc1 activation.

      In the wild-type scenario, Dip2 does not colocalize with Pkc1. Dip2 predominantly localizes to the mitochondria and mitochondria-vacuole contact sites, while Pkc1 is found in the cytosol, plasma membrane and bud site. Moreover, the localization of Plc1 has not yet been studied in yeast and therefore we currently lack data on the colocalisation of Dip2 and Plc1.

      However, to investigate whether Dip2 translocates to the plasma membrane under conditions requiring Plc1 or Pkc1 activation, we plan to probe the localization of Dip2 under cell wall stress condition. This would provide a better understanding of the spatial crosstalk between Dip2 and Pkc1. We will include the results in the revised manuscript.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript by Alonso-Caraballo et al, is a novel piece of work that examines the impact of oxycodone self-administration on neural plasticity within the paraventricular thalamic (PVT) to nucleus accumbens shell (Shell) pathway - two regions shown to play a key role in cue-induced drug seeking on their own, and whether this plasticity varies based on abstinence period and biological sex.

      Strengths:

      The authors show using a clinically relevant long-access model of opioid self-administration promotes dependence and acute withdrawal in both male and female rats. During subsequent cue-induced relapse tests at 1 or 14 days following the conclusion of self-administration, data show that while both males and females demonstrate drug-seeking behavior at both time points, females show a further elevation in responding on day 14 versus day 1 which is not observed in the males. When accounting for past work showing elevations in drug-seeking in males after 30 days, these data indicate that craving-induced relapse for opioids may develop faster and may be more pronounced in females compared to males.

      These behavioral findings were paralleled by the use of ex vivo acute slice electrophysiology and circuit-specific ex vivo optogenetics to examine the impact of oxycodone self-administration on synaptic strength within the paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) pathway(s). Data support a time-dependent but sex-independent strengthening of glutamatergic signaling at PVT-to-NAcSh medium spiny neurons (MSNs) that is only present following a relapse test at 14 days post abstinence in males versus females, providing the first evidence that opioid self-administration and/or cue-induced drug-seeking augments this pathway. Using an extensive set of physiological measures, the authors show that this increased synaptic strength reflects an upregulation of presynaptic release probability. Further, this upregulation of excitatory signaling aligned temporally with an increase in MSN excitability, as assessed by increases in action potential firing frequency. Finally, the authors provide the first evidence that similar to other inputs to the NAcSh, PVT projections innervate both MSN as well as local interneurons, promoting a GABA-A-specific feedforward inhibitory circuit. Interestingly, unlike direct excitatory inputs to MSNs, no changes were observed ostensibly within this feedforward circuit, highlighting a selective enhancement of excitatory drive and output of MSNs with protracted abstinence.

      Overall, these data highlight a potential role for heightened synaptic strength within the PVT-NAcSh pathway in cue-induced relapse behavior during protracted abstinence and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals.

      Weaknesses:

      Overall, the experimental approach and data provided appear rigorous and support their overall conclusions and achieve their goal of understanding how opioid self-administration impacts synaptic strength within the PVT-NAcSh pathway. Although not undermining these data, there are a few potential weaknesses that reduce the impact of the work. For example, the inability to directly assess whether cue-induced drug-seeking is in fact augmented compared to daily intake during self-administration in the maintenance face only permits the authors to denote that reexposure to cues and the context is sufficient to promote active lever pressing without demonstrating whether seeking behavior is in fact elevated further during a cue test. This is notably understandable as drug available sessions were 6-hours versus a 1-hour relapse test. Importantly, it is clearly demonstrated that drug seeking is higher on average in female mice after 14 days versus 1 day.

      With regard to the interpretation of electrophysiology findings, the lack of inclusion of an abstinence-only group does not permit interpretations to parse out whether observed increases in synaptic strength (or the lack of) reflect abstinence or an interaction between abstinence period and re-exposure to the operant chamber, as slices were taken 30-45 min post relapse test. While much literature has shown that drug-induced adaptations in the NAc require a post-drug period for plasticity to measurably emerge, studies have also shown that re-exposure to heroin-associated cues following abstinence seemingly "reverses" increases in cell excitability in prelimbic-NAc pyramidal neurons (Kokane et al., 2023) and that depotentiation of morphine-induced increases in synaptic strength in the NAc shell can be depotentiated by drug re-exposure - an effect also observed with cocaine re-exposure (Madayag et al., 2019). Notably, the lack of effect at 14 but not 1 day supports the likelihood that the relapse test does not in fact influence the plasticity within the PVT-NAcSh circuit.

      While the lack of effect on AMPAR:NMDAR ratio and rectification indices do support the notion that enhanced EPSC amplitudes in input-output curves do not reflect a change in AMPAR subunit expression (i.e., increased GluA2-lacking receptors that exhibit inward rectification at depolarized potential) nor a change in postsynaptic sensitivity to glutamate, without direct assessment of AMPAR-specific and NMDAR-specific input-output curves, it doesn't definitively exclude the possibility that both AMPA and NMDA receptor currents are being upregulated, thus negating an observable change in postsynaptic strength.

      Overall, these findings provide novel insight into how the PVT-NAcSh pathway is altered by opioid self-administration and whether this is unique based on abstinence period and sex. Importantly, these were the primary objectives stated by the author. Data highlight a potential role for the observed adaptations in relapse behavior and identify a potential therapeutic target during abstinence to reduce relapse risk in abstaining individuals. However, it should be noted that no causal link is demonstrated without experiments to reduce/prevent relapse.

    2. Reviewer #2 (Public review):

      This is an interesting paper from Alonso-Caraballo and colleagues that examines the influence of opioid use, abstinence, and sex on paraventricular thalamus (PVT) to nucleus accumbens shell (NAcSh) medium spiny neurons circuit physiology. The authors first find that prolonged abstinence from extended access to oxycodone self-administration leads to profoundly increased cue-induced reinstatement in females. Next, they found that prolonged abstinence increased PVT-NAcSh MSN synaptic strength, an effect that was likely due to presynaptic adaptation (paired-pulse ratio was decreased in both sexes).

      While this paper is certainly interesting, and well-written, and the experiments seem to be well performed, the behavioral and physiological effects observed are somewhat divorced. Specifically, what accounts for the heightened relapse in females? Since no opioid-related sex differences were observed in PVT-NAcSh neurophysiology, it is unclear how the behavioral and neurophysiological data fit together. Furthermore, the lack of functional manipulation of PVT-NAcSh circuitry leaves one to wonder if this circuit is even important for the behavior that the authors are measuring. I would be more positive about this study if the authors were able to resolve either of the two issues noted above.

      I also noted more moderate weaknesses that the authors should consider:

      (1) There are insufficient animals in some cases. For example, in Figure 4, the Male Saline 14-day abstinence group (n = 3 rats) has less than half of the excitability as compared to the Male Saline 1-day abstinence group (n = 7 rats). This is likely due to variance between animals and, possibly, oversampling. Thus, more rats need to be added to the 14-day abstinence group. Additionally, the range of n neurons/rat should be reported for each experiment to ensure readers that oversampling from single animals is not occurring.

      (2) The IPSC data, for example in Figure 4, is one of the more novel experiments in the manuscript. However, it is quite challenging to see the difference between males and females, saline and oxycodone, at low stimulation intensities within the graph. Authors should expand this so that reviewers/readers can see those data, especially considering other work suggesting that PVT synaptic input onto select NAc interneurons is disrupted following opioid self-administration. Additional comment: It's also interesting that the IPSC amplitude seems to be maximal at ~2mW of light, whereas ~11 mW is required to evoke maximal EPSC amplitude. It would be interesting to know the authors' thoughts on why this may be.

      (3) There is an inadequate description of what has been done to date on the PVT-NAc projection regarding opioid withdrawal, seeking, disinhibition, and the effects on synaptic physiology therein. For example, a critical paper, Keyes et al., 2020 Neuron, is not cited. Additionally, Paniccia et al., 2024 Neuron is inaccurately cited and insufficiently described. Both manuscripts should be described in some detail within the introduction, and the findings should be accurately contextualized within the broader circuit within the discussion.

      (4) Related to the above, the authors should provide a more comprehensive description of how PVT synapses onto cell-type specific neurons in the NAc which expands beyond MSNs, especially considering that PVT has been shown to influence drug/opioid seeking through the innervation of NAc neurons that are not MSNs. For example, see PMIDs 33947849, 36369508, 28973852, 38141605.

    3. Reviewer #3 (Public review):

      Summary:

      In this paper, Alonso-Caraballo et al. investigate sex-specific differences in oxycodone self-administration, withdrawal, and relapse behaviors in rats, as well as associated synaptic plasticity in the paraventricular thalamus to nucleus accumbens shell (PVT-NAcSh) circuit. The authors employ a combination of behavioral paradigms and ex vivo electrophysiology to examine how acute (1-day) and prolonged (14-day) abstinence from oxycodone self-administration affect cue-induced drug-seeking and synaptic transmission in male and female rats. Their findings reveal that while both sexes show similar oxycodone self-administration and acute withdrawal symptoms, females exhibit enhanced cue-induced relapse after prolonged abstinence. Furthermore, they show that prolonged abstinence is associated with increased synaptic strength in the PVT-NAcSh circuit (reduced paired-pulse ratio) and enhanced intrinsic excitability of NAcSh medium spiny neurons in both sexes. This study provides important insights into the sex-specific neural adaptations that may underlie vulnerability to opioid relapse and highlights the PVT-NAcSh circuit as a potential target for therapeutic interventions. However, although this study is well designed, no sex differences were observed in the synaptic activity within this pathway that could explain increased oxycodone seeking in females versus male rats. Additional experiments could strengthen the results and help clarify synaptic mechanisms underpinning behavioral sex differences.

      Strengths:

      The study exhibits several strengths. It provides a comprehensive behavioral analysis of oxycodone self-administration, withdrawal, and cue-induced relapse in both male and female rats at different time points (acute vs. protracted withdrawal) offering valuable insights into sex-specific differences (i.e., increased oxycodone seeking in females over time but not males). The authors examine synaptic plasticity in the PVT-NAcSh circuit at different abstinence time points, integrating behavioral and electrophysiological data to link circuit adaptations with relapse behaviors, although no sex differences in the electrophysiological parameters examined were evident. The investigation of intrinsic excitability changes in NAcSh medium spiny neurons further enhances the study's depth. Overall, the well-designed experiments provide important insights into the neural adaptations that may underlie vulnerability to opioid relapse, highlighting the PVT-NAcSh circuit as a potential target for therapeutic interventions in opioid use disorder.

      Weaknesses:

      Despite its strengths, the study has several notable limitations. A key weakness is the lack of observed sex differences in synaptic activity within the PVT-NAcSh pathway that could explain the behavioral results. The authors' failure to differentiate between D1 and D2 medium spiny neurons (MSNs) in the nucleus accumbens represents a missed opportunity to identify potential sex-specific differences at the cellular level, although they do discuss reasons for this omission. The only significant synaptic change observed - reduced paired-pulse ratio indicating increased synaptic strength - occurs in both males and females, failing to explain the sex-specific behavioral differences. Furthermore, the investigation of intrinsic excitability in NAc MSNs adds complexity to data interpretation, as the authors neither differentiate between D1 and D2 MSNs nor confirm that recorded neurons receive direct inputs from the PVT. This assumption potentially confounds the results. Overall, while the study provides valuable insights, additional experiments targeting specific cell populations and more detailed synaptic analyses are needed to elucidate the mechanisms underlying the observed behavioral sex differences in opioid relapse vulnerability.

    1. eLife Assessment

      This manuscript describes valuable findings regarding the expression pattern of orexin receptors in the midbrain and how manipulating this system influences several behaviors, such as context-induced locomotor activity and exploration. The overall strength of evidence - which includes anatomical, viral manipulation studies, and brain imaging - is solid and broadly substantiates claims in the paper. However, there are several areas in which the conclusions are only partially supported by the combination of methods used. These results have implications for understanding the neural underpinnings of reward and will be of interest to neuroscientists and cognitive scientists with an interest in the neurobiology of reward.

    2. Reviewer #1 (Public review):

      In this manuscript, the role of orexin receptors in dopamine transmission is studied. It extends previous findings suggesting an interplay between these two systems in regulating behaviour by first characterizing the expression of orexin receptors in the midbrain and then disrupting orexin transmission in dopaminergic neurons by deleting its predominant receptor, OX1R (Ox1R fl/fl, Dat-Cre tg/wt mice). Electrophysiological and calcium imaging data suggest that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons but does not seem to induce c-Fos expression. Behavioral effects of depleting OX1R from dopaminergic neurons include enhanced novelty-induced locomotion and exploration, relative to littermate controls (Ox1R fl/fl, Dat-Cre wt/wt). However, no difference between groups is observed in tests that measure reward processing, anxiety, and energy homeostasis. To test whether the depletion of OX1R alters overall orexin-triggered activation across the brain, PET imaging is used in OX1R∆DAT knockout and control mice. This analysis reveals that several regions show higher neuronal activation after orexin injection in OX1R∆DAT mice, but the authors focus their follow-up study on the dorsal bed nucleus of the stria terminalis (BNST) and lateral paragigantocellular nucleus (LPGi). Dopaminergic inputs and expression of dopamine receptors type-1 and -2 (DRD1 & DRD2) are assessed and compared to control demonstrating a moderate decrease in DRD1 and DRD2 expression in the BNST of OX1R∆DAT mice and unaltered expression of DRD2, with absence of DRD1 expression in LPGi of both groups. Overall, this study is valuable for the information it provides on orexin receptor expression and function in behaviour, as well as for the new tools it generated for the specific study of this receptor in dopaminergic circuits.

      Strengths:

      The use of a transgenic line that lacks OX1R in dopamine-transporter expressing neurons is a strong approach to dissect the direct role of orexin in modulating dopamine signaling in the brain. The battery of behavioral assays used to study this line provides valuable information for researchers interested in the interplay between dopamine and orexin systems and their role in animal physiology.

      Weaknesses:

      This study falls short in providing evidence for an anatomical substrate and mechanism underlying the altered behavior observed in mice lacking orexin receptor subtype 1 in dopaminergic neurons. How orexin transmission in dopaminergic neurons regulates the expression of postsynaptic dopamine receptors (as observed in the BNST of OX1R∆DAT mice) is an intriguing question not addressed in this study. An important aspect not investigated in this study is whether the disruption of orexin activity affects dopamine release in target areas.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines expression of orexin receptors in midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that substantia nigra dopamine neurons predominantly express orexin receptor 1 subtype and then go on to delete this receptor in dopamine transporter-expressing neurons using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that, in the absence of this receptor, orexin is no longer able to excite dopamine neurons of the substantia nigra.

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin.

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus in on changes in dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. They also show changes in receptor expression in the transgenic mice. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths:

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:<br /> (1) Distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.<br /> (2) Use of the genetic model that knocks out a specific orexin receptor subtype from dopamine-transporter-expressing neurons is a useful model and helps to narrow down the behavioral significance of this interaction.<br /> (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially since two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood.

      Weaknesses:

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies do not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. In addition, the authors' working model for how they think orexin-dopamine interactions contribute to behavior under normal physiological conditions is not well-described.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      In this manuscript, the role of orexin receptors in dopamine transmission is studied. It extends previous findings suggesting an interplay of these two systems in regulating behaviour by first characterising the expression of orexin receptors in the midbrain and then disrupting orexin transmission in dopaminergic neurons by deleting its predominant receptor, OX1R (Ox1R fl/fl, DatCre tg/wt mice). Electrophysiological and calcium imaging data suggest that orexin A acutely and directly stimulates SN and VTA dopaminergic neurons, but does not seem to induce c-Fos expression. Behavioural effects of depleting OX1R from dopaminergic neurons includes enhanced noveltyinduced locomotion and exploration, relative to littermate controls (Ox1R fl/fl, Dat-Cre wt/wt). However, no difference between groups is observed in tests that measure reward processing, anxiety, and energy homeostasis. To test whether depletion of OX1R alters overall orexin-triggered activation across the brain, PET imaging is used in OX1R∆DAT knockout and control mice. This analysis reveals that several regions show a higher neuronal activation after orexin injection in OX1R∆DAT mice, but the authors focus their follow up study on the dorsal bed nucleus of the stria terminalis (BNST) and lateral paragigantocellular nucleus (LPGi). Dopaminergic inputs and expression of dopamine receptors type-1 and -2 (DRD1 & DRD2) is assessed and compared to control demonstrating moderate decrease of DRD1 and DRD2 expression in BNST of OX1R∆DAT mice and unaltered expression of DRD2, with absence of DRD1 expression in LPGi of both groups. Overall, this study is valuable for the information it provides on orexin receptor expression and function on behaviour and for the new tools it generated for the specific study of this receptor in dopaminergic circuits. 

      Strengths: 

      The use of a transgenic line that lacks OX1R in dopamine-transporter expressing neurons is a strong approach to dissect the direct role of orexin in modulating dopamine signalling in the brain. The battery of behavioural assays to study this line provides a valuable source of information for researchers interested in the role of orexin in animal physiology. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      This study falls short in providing evidence for an anatomical substrate of the altered behaviour observed in mice lacking orexin receptor subtype 1 in dopaminergic neurons. How orexin transmission in dopaminergic neurons regulates the expression of postsynaptic dopamine receptors (as observed in BNST of OX1R<sup>∆DAT</sup> mice) is an intriguing question poorly discussed. Whether disruption of orexin activity alters dopamine release in target areas is an important point not addressed. 

      We identified dopaminergic fibers and dopamine receptors in the dBNST and LPGi, suggesting anatomical basis for dopamine neurons to regulate neural activity and receptor expression levels in these areas. PET imaging scan and c-Fos staining revealed that Ox1R signaling in dopaminergic cells regulates neuronal activity in dBNST and LPGi. The expression levels of Th were unchanged in both regions. Dopamine receptor 2 (DRD2), but not DRD1, is expressed in LPGi. The deletion of Ox1R in DAT-expressing cells did not affect DRD2 expression in LPGi. The expression levels of DRD1 and DRD2 were decreased or showed a tendency to decrease in dBNST. 

      We included the comments in the discussion in this revised manuscript (lines 308-312): ‘The expression levels of Th were not altered in dBNST or LPGi by Ox1R deletion in dopaminergic neurons. It remains unclear whether dopamine release is affected in these regions. It is possible that either the dopaminergic regulation of neuronal activity or the changes in dopamine release could lead to the decreased expression of dopamine receptors in dBNST.’

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript examines expression of orexin receptors in midbrain - with a focus on dopamine neurons - and uses several fairly sophisticated manipulation techniques to explore the role of this peptide neurotransmitter in reward-related behaviors. Specifically, in situ hybridization is used to show that dopamine neurons predominantly express orexin receptor 1 subtype and then go on to delete this receptor in dopamine transporter-expressing using a transgenic strategy. Ex vivo calcium imaging of midbrain neurons is used to show that, in the absence of this receptor, orexin is no longer able to excite dopamine neurons of the substantia nigra. 

      The authors proceed to use this same model to study the effect of orexin receptor 1 deletion on a series of behavioral tests, namely, novelty-induced locomotion and exploration, anxiety-related behavior, preference for sweet solutions, cocaine-induced conditioned place preference, and energy metabolism. Of these, the most consistent effects are seen in the tests of novelty-induced locomotion and exploration in which the mice with orexin 1 receptor deletion are observed to show greater levels of exploration, relative to wild-type, when placed in a novel environment, an effect that is augmented after icv administration of orexin. 

      In the final part of the paper, the authors use PET imaging to compare brain-wide activity patterns in the mutant mice compared to wildtype. They find differences in several areas both under control conditions (i.e., after injection of saline) as well as after injection of orexin. They focus in on changes in dorsal bed nucleus of stria terminalis (dBNST) and the lateral paragigantocellular nucleus (LPGi) and perform analysis of the dopaminergic projections to these areas. They provide anatomical evidence that these regions are innervated by dopamine fibers from midbrain, are activated by orexin in control, but not mutant mice, and that dopamine receptors are present. Thus, they argue these anatomical data support the hypothesis that behavioral effects of orexin receptor 1 deletion in dopamine neurons are due to changes in dopamine signaling in these areas.

      Strengths: 

      Understanding how orexin interacts with the dopamine system is an important question and this paper contains several novel findings along these lines. Specifically:

      (1) Distribution of orexin receptor subtypes in VTA and SN is explored thoroughly.

      (2) Use of the genetic model that knocks out a specific orexin receptor subtype from dopaminetransporter-expressing neurons is a useful model and helps to narrow down the behavioral significance of this interaction.  

      (3) PET studies showing how central administration of orexin evokes dopamine release across the brain is intriguing, especially that two key areas are pursued - BNST and LPGi - where the dopamine projection is not as well described/understood. 

      We thank the reviewer for summarizing the importance and significance of our study. 

      Weaknesses: 

      The role of the orexin-dopamine interaction is not explored in enough detail. The manuscript presents several related findings, but the combination of anatomy and manipulation studies do not quite tell a cogent story. Ideally, one would like to see the authors focus on a specific behavioral parameter and show that one of their final target areas (dBNST or LPGi) was responsible or at least correlated with this behavioral readout. 

      We agree that exploring the orexin-dopamine interactions in more detail and focusing on the behavioral impact of their final target areas (e.g., dBNST or LPGi), would provide valuable data. While we are very interested in pursuing these studies, the aim of the present manuscript is to provide an overview of the behavioral roles of orexin-dopamine interaction and to propose some promising downstream pathways in a relatively broad and systematic manner. 

      In many places in the Results, insufficient explanation and statistical reporting is provided. Throughout the Results - especially in the section on behavior although not restricted to this part - statements are made without statistical tests presented to back up the claims, e.g., "Compared to controls, Ox1R<sup>ΔDAT</sup> 143 mice did not show significant changes in spontaneous locomotor activity in home cages" (L143) and "In a hole-board test, female Ox1RΔDAT mice showed increased nose pokes into the holes in early (1st and 2nd) sessions compared to control mice" (L151). In other places, ANOVAs are mentioned but full results including main effects and interactions are not described in detail, e.g., in F3-S3, only a single p-value is presented and it is difficult to know if this is the interaction term or a post hoc test (L205). These and all other statements need statistics included in the text as support. Addition of these statistical details was also requested by the editor. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. We thank the reviewer for pointing out our lack of clarity in the manuscript. In this revised manuscript, we included the statistical details of ANOVAs mentioned above in the figure legends. In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials.

      In the presentation of reward processing this is particularly important as no statistical tests are shown to demonstrate that controls show a cocaine-induced preference or a sucrose preference. Here, one option would be to perform one-sample t-tests showing that the data were different to zero (no preference). As it is, the claim that "Both of the control and Ox1RΔDAT groups showed a preference for cocaine injection" is not yet statistically supported. 

      We thank the reviewer for the suggestions. We have added the one-sample t-test results in this revised manuscript (Figure 2–figure supplement 4, lines 171 - 183). 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors): 

      Can the authors comment on overlap between DAT and Ox1R in brain areas outside VTA/SN? Is there any? 

      We only focused on the expression patterns of orexin receptors in VTA/SN, and we did not examine other brain regions. Additionally, little is known from the literature about the expression of Ox1R in DAT-expressing cells in brain areas outside VTA/SN. Further analysis is necessary to answer this question. We have added the comment in our discussion (lines 243 - 344).

      For the Ca2+ imaging experiment, it is unclear to me why the authors do not show all the neurons (almost 160 in total) and just select 5 neurons to show for each condition. 

      Heat maps of all recorded neurons are now shown in Figure 1—figure supplement 4.

      There are other claims that still require a statistical justification to be included in addition to the passages on behavior mentioned above, e.g., "Increasing the orexin A concentration to 300 nM further increased [Ca2+]i" (L118). 

      Authors should ensure that all such claims are either presented with a statistical test or are phrased differently, e.g. "Visual inspection of data suggested that there was a further increase...". In addition, when an ANOVA is conducted, full results including main effects and interactions should be described. 

      We emphasize now our statement that ALREADY 100 nM orexin A significantly increased [Ca<sup>2+</sup>]i levels (lines 117 - 118).

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as main effects and interactions, are presented alongside the source data in the respective spreadsheets. For clarity, we chose to include only the key statistical information in the main text and figures. We thank the reviewer for pointing this out. In this revised manuscript, we have emphasized in each figure legend: ‘Source data and full statistics are provided in the supplementary materials’.

      Typos in figure captions  

      F2-S1 - spontanous 

      F3-S2 - intrest 

      We apologize for the typos. We have corrected them in this revised manuscript.

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05. 

      We submitted all our source data as Excel spreadsheets to eLife during our first-round revision, and the full statistics, such as test statistics, df and 95% confidence intervals, are presented alongside the source data in the respective spreadsheets. We thank the editor’s note. In this revised manuscript, we have included more statistical information in the main text and figure legends (see our response to reviewer #2). In the figure legends, we also explained that the full statistics were provided alongside the source data in the supplementary materials. In addition, we also uploaded the source data and full statistics in the bioRxiv before we upload this revised manuscript to eLife.

    1. eLife Assessment

      This valuable study suggests that the dosage compensation complex and m6A act in a feedback loop in Drosophila melanogaster. The study provides integrated analyses of RNA sequencing and mapping data of the m6A RNA modification in the context of unbalanced genomes, which suggests that m6A modification status may influence H3K16Ac deposition through regulation of the acetyltransferase MOF. However, it is not clear whether this regulation is directly or indirectly related to m6A regulation. The evidence is considered incomplete due to technical concerns, as quantitative assessments were made using non-quantitative methods.

    2. Reviewer #1 (Public review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. In my initial review I pointed out that MeRIP experiments are not quantitative and can be difficult to interpret when small changes are present. The data as presented still show only RPKM in IP samples, and the text alludes to changes in IP enrichment that are significant but the data do not appear to have been included in the figure. Concerns about the bulk-level m6A measurements also remain, as the new data showing m6A levels in mRNA show changes that are even smaller than those initially demonstrated in total RNA. Yet the data are still presented as significant, biologically relevant changes. The conclusions about mRNA m6A levels are not strengthened by measurements.

    3. Reviewer #2 (Public review):

      Summary:

      The authors have tested effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop. Western blots confirm that Msl2 and MOF alleles alter levels of Mettl3 complex components, but the underlying mechanism remains undefined.

      Strengths:

      • Thorough bioinformatic analysis of their data<br /> • Incorporation of other published datasets that enhances scope and rigor<br /> • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males<br /> • Provides preliminary evidence that this counting mechanism may be due to DCC effects on expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A levels on specific sites remains unclear in this revision.<br /> • The paper relies on m6A comparisons across tissues and developmental stages, which introduces some uncertainty about where and when the DCC-m6A loop acts.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study sought to reveal the potential roles of m6A RNA methylation in gene dosage regulatory mechanisms, particularly in the context of aneuploid genomes in Drosophila. Specifically, this work looked at the relationships between the expression of m6A regulatory factors, RNA methylation status, classical and inverse dosage effects, and dosage compensation. Using RNA sequencing and m6A mapping experiments, an in-depth analysis was performed to reveal changes in m6A status and expression changes across multiple aneuploid Drosophila models. The authors propose that m6A methylation regulates MOF and, in turn, deposition of H4K16Ac, critical regulators of gene dosage in the context of genomic imbalance.

      Strengths:

      This study seeks to address an interesting question with respect to gene dosage regulation and the possible roles of m6A in that process. Previous work has linked m6A to X-inactivation in humans through the Xist lncRNA, and to the regulation of the Sxl in flies. This study seeks to broaden that understanding beyond these specific contexts to more broadly understand how m6A impacts imbalanced genomes in other contexts.

      Weaknesses:

      The methods being used particularly for analysis of m6A at both the bulk and transcript-specific level are not sufficiently specific or quantitative to be able to confidently draw the conclusions the authors seek to make. MeRIP m6A mapping experiments can be very valuable, but differential methylation is difficult to assess when changes are small (as they often are, in this study but also m6A studies more broadly). For instance, based on the data presented and the methods described, it is not clear that the statement that "expression levels at m6A sites in aneuploidies are significantly higher than that in wildtype" is supported. MeRIP experiments are not quantitative, and since there are far fewer peaks in aneuploidies, it stands to reason that more antibody binding sites may be available to enrich those fewer peaks to a larger extent. But based on the data as presented (figure 2D) this conclusion was drawn from RPKM in IP samples, which may not fully account for changing transcript abundances in absolute (expression level changes) and relative (proportion of transcripts in input RNA sample) terms.

      Methylated RNA immunoprecipitation followed by sequencing (MeRIP-seq) is a commonly used strategy of genome-wide mapping of m6A modification. This method uses anti-m6A antibody to immunoprecipitate RNA fragments, which results in selective enrichment of methylated RNA. Then the RNA fragments were subjected to deep sequencing, and the regions enriched in the immunoprecipitate relative to input samples are identified as m6A peaks using the peak calling algorithm. We identified m6A peaks in different samples by the exomePeak2 program and determined common m6A peaks for each genotype based on the intersection of biological replicates. Figure 2D shows the RPM values of m6A peaks in MeRIP samples for each genotype, indicating that the levels of reads in the m6A peak regions were significantly higher in the aneuploid IP samples than in wildtypes. When the enrichment of IP samples relative to Input samples (RPM.IP/RPM.Input) was taken into account, the statistics for all three aneuploidies were still significantly higher than those of the wildtypes (Mann Whitney U test p-values < 0.001). This analysis is not about changes in the abundance of transcripts, but from the MeRIP perspective, showing that there are relatively more m6A-modified reads mapped to the m6A peaks in aneuploidies than that in wildtypes. We hope to provide a possible explanation for the phenomenon that the quantitative changes of m6A peaks are not consistent with the overall m6A abundance trend. We have added the results of IP/Input in the main text, and revised the description in the manuscript to make it more precise to reduce possible misunderstandings.

      The bulk-level m6A measurements as performed here also cannot effectively support these conclusions, as they are measured in total RNA. The focus of the work is mRNA m6A regulators, but m6A levels measured from total RNA samples will not reflect mRNA m6A levels as there are other abundance RNAs that contain m6A (including rRNA). As a result, conclusions about mRNA m6A levels from these measurements are not supported.

      According to published articles, m6A levels of mRNA or total RNA can be detected by different methods (such as mass spectrometry, 2D thin-layer chromatography, etc.) in Drosophila cells or tissues [1-3]. We used the EpiQuik m6A RNA Methylation Quantification Kit, which is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses. This kit has previously been used by researchers to detect the m6A/A ratio in total RNA [4, 5] or purified mRNA [6] from different species. Our pre-experiments showed that the enrichment of mRNA from total RNA did not appear to significantly affect the results of the detection of m6A levels.

      We extracted and purified mRNA from the heads of the control and MSL2 transgenic Drosophila to verify our conclusion. mRNA was isolated from total RNA using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). It was showing a heightened abundance of m6A modification on mRNA as opposed to total RNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Compared with control Drosophila, the abundance changes of m6A in mRNA and total RNA in MSL2 transgenic Drosophila are basically the same. These results supported the conclusions in our manuscript. In the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). The only difference is that no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed female and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). It is suggested that m6A modification in other types of RNA other than mRNA (e.g., lncRNA, rRNA) is not necessarily meaningless, which is the future research direction. We will also add discussions of this issue in the manuscript.

      (1) Lence T, et al. (2016) m6A modulates neuronal functions and sex determination in Drosophila. Nature 540(7632):242-247.

      (2) Haussmann IU, et al. (2016) m(6)A potentiates Sxl alternative pre-mRNA splicing for robust Drosophila sex determination. Nature 540(7632):301-304.

      (3) Kan L, et al. (2017) The m(6)A pathway facilitates sex determination in Drosophila. Nat Commun 8:15737.

      (4) Zhu C, et al. (2023) RNA Methylome Reveals the m(6)A-mediated Regulation of Flavor Metabolites in Tea Leaves under Solar-withering. Genomics Proteomics Bioinformatics 21(4):769-787.

      (5) Song H, et al. (2021) METTL3-mediated m(6)A RNA methylation promotes the anti-tumour immunity of natural killer cells. Nat Commun 12(1):5522.

      (6) Yin H, et al. (2021) RNA m6A methylation orchestrates cancer growth and metastasis via macrophage reprogramming. Nat Commun 12(1):1394.

      Reviewer #2 (Public Review):

      Summary:

      The authors have tested the effects of partial- or whole-chromosome aneuploidy on the m6A RNA modification in Drosophila. The data reveal that overall m6A levels trend up but that the number of sites found by meRIP-seq trend down, which seems to suggest that aneuploidy causes a subset of sites to become hyper-methylated. Subsequent bioinformatic analysis of other published datasets establish correlations between the activity of the H4K16 acetyltransferase dosage compensation complex (DCC) and the expression of m6A components and m6A abundance, suggesting that DCC and m6A can act in a feedback loop on each other. Overall, this paper uses bioinformatic trends to generate a candidate model of feedback between DCC and m6A. It would be improved by functional studies that validate the effect in vivo.

      Strengths:

      • Thorough bioinformatic analysis of their data.

      • Incorporation of other published datasets that enhance scope and rigor.

      • Finds trends that suggest that a chromosome counting mechanism can control m6A, as fits with pub data that the Sxl mRNA is m6A modified in XX females and not XY males.

      • Suggests this counting mechanism may be due to the effect of chromatin-dependent effects on the expression of m6A components.

      Weaknesses:

      • The linkage between H4K16 machinery and m6A is indirect and based on bioinformatic trends with little follow-up to test the mechanistic bases of these trends.

      Western blots were performed to detect H4K16Ac in Ythdc1 knockdown Drosophila and control Drosophila. Through quantitative analysis, it is demonstrated that H4K16Ac levels changed significantly in Ythdc1 knockdown Drosophila. Combined with the results of polytene chromosome immunostaining in third instar larvae, we found that Ythdc1 affects the expression of H4K16Ac in tissue- and developmental stage-specific manners. This specificity may be associated with the onuniformity and heterogeneity of RNA m6A modification characteristics, encompassing the tissue specificity, the developmental specificity, the different numbers of m6A sites in one transcript, the different proportions of methylated transcripts, et cetera [1-3].

      In addition, we found a set of ChIP-seq data (GSE109901) of H4K16ac in female and male Drosophila larvae from the public database, and analyzed whether H4K16ac is directly associated with m6A regulator genes. ChIP-seq is a standard method to study transcription factor binding and histone modification by using efficient and specific antibodies for immunoprecipitation. The results showed that there were H4K16ac peaks at the 5' region in gene of m6A reader Ythdc1 in both males and females. In addition, most of the genome sites where the other m6A regulator genes located are acetylated at H4K16 in both sexes, except that Ime4 shows sexual dimorphism and only contains H4K16ac peak in females. These results indicate that the m6A regulator gene itself is acetylated at H4K16, so there is a direct relationship between H4K16ac and m6A regulators. We have added these contents to the text.

      Our analysis of experimental outcomes and public sequencing data has shed light on the interaction of the m6A reader protein Ythdc1 with H4K16Ac. We appreciate your interest in the complex interplay between H4K16Ac and m6A modifications. We acknowledge the intricacy of this interaction and concur that it merits further investigation, potentially supported by additional experiments.

      In current submitted manuscript, it is mainly focused on the role of RNA m6A modification in genomes experiencing imbalance, and we are going to explore this complex interplay in subsequent work for sure.

      (1) Meyer, K. D., et al. (2012). Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons. Cell, 149(7), 1635-1646.

      (2) Meyer, K. D., & Jaffrey, S. R. (2014). The dynamic epitranscriptome: N6-methyladenosine and gene expression control. Nature Reviews: Molecular Cell Biology, 15(5), 313-326.

      (3) Zaccara, S., Ries, R. J., & Jaffrey, S. R. (2019). Reading, writing and erasing mRNA methylation. Nature Reviews: Molecular Cell Biology, 20(10), 608-624.

      • The paper lacks sufficient in vivo validation of the effects of DCC alleles on m6A and vice versa. For example, Is the Ythdc1 genomic locus a direct target of the DCC component Msl-2 ? (see Figure 7).

      In order to study whether Ythdc1 genomic locus is a direct target of DCC component, we first analyzed a published MSL2 ChIP-seq data of Drosophila (GSE58768). Since MSL2 is only expressed in males under normal conditions, this set of data is from male Drosophila. According to the results, the majority (99.1%) of MSL2 peaks are located on the X chromosome, while the MSL2 peaks on other chromosomes are few. This is consistent with the fact that MSL2 is enriched on the X chromosome in male Drosophila [1, 2]. Ythdc1 gene is located on chromosome 3L, and there is no MSL2 peak near it. Similarly, other m6A regulator genes are not X-linked, and there is no MSL2 peak. Then we analyzed the MOF ChIP-seq data (GSE58768) of male Drosophila. It was found that 61.6% of MOF peaks were located on the X chromosome, which was also expected [3, 4]. Although there are more MOF peaks on autosomes than MSL2 peaks, MOF peaks are absent on m6A regulator genes on autosomes. Therefore, at present, there is no evidence that the gene locus of m6A regulators are the direct targets of DCC component MSL2 and MOF, which may be due to the fact that most MSL2 and MOF are tethered to the X chromosome by MSL complex under physiological conditions. Whether there are other direct or indirect interactions between Ythdc1 and MSL2 is an issue worthy of further study in the future.

      (1) Bashaw GJ & Baker BS (1995) The msl-2 dosage compensation gene of Drosophila encodes a putative DNA-binding protein whose expression is sex specifically regulated by Sex-lethal. Development 121(10):3245-3258.

      (2) Kelley RL, et al. (1995) Expression of msl-2 causes assembly of dosage compensation regulators on the X chromosomes and female lethality in Drosophila. Cell 81(6):867-877.

      (3) Kind J, et al. (2008) Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133(5):813-828.

      (4) Conrad T, et al. (2012) The MOF chromobarrel domain controls genome-wide H4K16 acetylation and spreading of the MSL complex. Dev Cell 22(3):610-624.

      Quite a bit of technical detail is omitted from the main text, making it difficult for the reader to interpret outcomes.

      (1) Please add the tissues to the labels in Figure 1D.

      Figure 1D shows the subcellular localization of FISH probe signals in Drosophila embryos. Arrowheads indicate the foci of probe signals. The corresponding tissue types are (1) blastoderm nuclei; (2) yolk plasm and pole cells; (3) brain and midgut; (4) salivary gland and midgut; (5) blastoderm nuclei and yolk cortex; (6) blastoderm nuclei and pole cells; (7) blastoderm nuclei and yolk cortex; (8) germ band. We have added these to the manuscript.

      (2) In the main text, please provide detail on the source tissues used for meRIP; was it whole larvae? adult heads? Most published datasets are from S2 cells or adult heads and comparing m6A across tissues and developmental stages could introduce quite a bit of variability, even in wt samples. This issue seems to be what the authors discuss in lines 197-199.

      In this article, the material used to perform MeRIP-seq was the whole third instar larvae. Because trisomy 2L and metafemale Drosophila died before developing into adults, it was not possible to use the heads of adults for MeRIP-seq detection of aneuploidy. For other experiments described here, the m6A abundance was measured using whole larvae or adult heads; material used for RT-qPCR analysis was whole larvae, larval brains, or adult heads; Drosophila embryos at different developmental stages were used for fluorescence in situ hybridization (FISH) experiments. We provide a detailed description of the experimental material for each assay in the manuscript.

      (3) In the main text, please identify the technique used to measure "total m6A/A" in Fig 2A. I assume it is mass spec.

      We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the m6A/A ratio in RNA samples. This kit is commercially available for quantification of m6A RNA methylation, which used colorimetric assay with easy-to-follow steps for convenience and speed, and is suitable for detecting m6A methylation status directly using total RNA isolated from any species such as mammals, plants, fungi, bacteria, and viruses.

      (4) Line 190-191: the text describes annotating m6A sites by "nearest gene" which is confusing. The sites are mapped in RNAs, so the authors must unambiguously know the identity of the gene/transcript, right?

      When the m6A peaks were annotated using the R package ChIPseeker, it will include two items: "genomic annotation" and "nearest gene annotation". "Genomic annotation" tells us which genomic features the peak is annotated to, such as 5’UTR, 3’UTR, exon, etc. "Nearest gene annotation" indicates which specific gene/transcript the peak is matched to. We modified the description in the main text to make it easier to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While I believe this study aims to address a very interesting question and demonstrates intriguing evidence suggesting a role for m6A in unbalanced genomes, technical limitations in the methods being used limited my confidence in the overall conclusions. In addition, some of the analyses seemed to distract a bit from the main question of the work, which made thoroughly reading and reviewing the work challenging at times due to the length and lack of cohesion. Some specific points and suggestions are detailed below.

      (1) Some specific points/recommendations for the bulk m6A measurements: for Figure 2A, the authors refer to m6A/A ratio in the text, but based on the methods section and axis labels in Figure 2A (as well as other figures), it may represent m6A% in total RNA. The authors should just clarify which one it is and make the text and figures consistent. The methods description also seems to specify that m6A is quantified in total RNA, and yet the factors being discussed (Ime4, Ythdc1, etc) are associated with m6A in mRNA. Since m6A is present in non-mRNAs (including highly abundant rRNAs), m6A analysis of total RNA may be masking some of the effects due to the relatively low abundance of mRNA relative to rRNA. It is possible that the above point contributes to the discrepancy between the overall m6A abundance in aneuploidies and the changing methylase expression levels (which does seem to correlate better with m6A sequencing data). On a related note, though the authors suggest in Figures 7E and F that m6A level changes are different in males and females, the levels and trends of m6A% in these panels seem quite similar, and the absence of the presence of statistical significance seems driven by higher variation (larger error bars) in the measurements in 7F (and again effects may be masked if total RNA is being quantified). This may be a very addressable issue, as m6A analysis of mRNA-enriched samples should be feasible, and in fact, may show clearer changes to better support the authors' conclusions.

      Thank you for your helpful comments.

      As suggested, the abundance of m6A on mRNA were detected (Figure 7E, F). Total RNA was extracted from the heads of the control and MSL2 transgenic Drosophila and mRNA was isolated using the Dynabeads mRNA purification kit (Invitrogen, Carlsbad, CA, USA, 61006). 300-600 ng mRNA can be purified from 40 μg total RNA (200-300 heads per sample). We used the EpiQuik m6A RNA Methylation Quantification Kit (Colorimetric) (Epigentek, NY, USA, Cat # P-9005) to measure the abundance of m6A in mRNA samples (200ng). The results obtained by this method represent the m6A/A ratio (%), which is also written as m6A% on the user guide of the kit. We made corresponding revisions in the main text and figures to made them consistent.

      It is showing a heightened abundance of m6A modification on mRNA as opposed to total RNA including some other types of RNA such as mRNA, lncRNA, and rRNA (Figure 7E,F; Figure 7—figure supplement 1G,H). Consistently, in the MSL2 knockdown Drosophila, the m6A modification levels on mRNA mirrored those observed on total RNA, exhibiting a significant downregulation (Figure 7E; Figure 7—figure supplement 1G). In contrast, no substantial difference in the m6A abundance on mRNA was detected between MSL2 overexpressed Drosophila and the control Drosophila (Figure 7F; Figure 7—figure supplement 1H). The differences of m6A abundance between males and females were not statistically significant (Figure 7E,F), prompting us to make revisions to the manuscript.

      (2) The analyses in Figures 5 and 6 describe a lot of different comparisons derived from these datasets, and while there seem to be many interesting new hypotheses to be tested, the authors do not make any definitive conclusions from these analyses. These figures also seem to diverge a bit from the main conclusion of the work, and from this reviewer's perspective made it more difficult to read and review the work. Overall streamlining the narrative may help readers appreciate the main conclusions of the work (though this is of course up to the author's discretion).

      As indicated in Figure 5, the results demonstrated a sexually dimorphic role of m6A modification in the regulation of gene expression in aneuploid Drosophila, suggesting its potential involvement in the gene regulatory network through interactions with dosage-sensitive regulators. Furthermore, Figure 6 illustrated the intricate interplay between RNA m6A modification, gene expression, and alternative splicing under genomic imbalance, with RNA splicing being more intimately associated with m6A methylation than gene transcription itself.

      This manuscript also discussed the correlation between methylation status and classical dosage effects, dosage compensation effects, and inverse dosage effects. We have initially demonstrated that RNA m6A methylation could influence dosage-dependent gene regulation via multiple avenues, such as interactions with dosage-sensitive modifiers, alternative splicing mechanisms, the MSL complex, and other related processes. Indeed, our study primarily utilizes m6A methylated RNA immunoprecipitation sequencing (MeRIP-Seq) to comprehensively investigate the role of RNA m6A modification in genomes experiencing imbalance. We agree that more specific and in-depth research on these factors will be instrumental in elucidating the precise mechanisms by which m6A modification regulates expression in unbalanced genomes, which we acknowledge as a significant avenue for our future research.

      We are grateful for your suggestions and, should it be necessary, we might to simplify the volume of the whole manuscript by removing or condensing the data analyse and description to enhance the prominence of the central theme.

      Reviewer #2 (Recommendations For The Authors):

      Overall, please provide enough technical detail in the main text so that the reader understands what was done, and does not have to repeatedly dig into figure legends and materials and methods to understand each data statement.

      Thank you for your suggestions. We have added some technical details to the manuscript and made some modifications as suggested.

    1. eLife Assessment

      This work presents important findings of a modulatory effect of yohimbine, an alpha2-adrenergic antagonist that raises noradrenaline levels, on the reconsolidation of emotionally neutral word-picture pairs, depending on the hippocampal and cortical reactivation during retrieval. The evidence supporting the main conclusions is convincing, with an elegant design combining fMRI and psychopharmacology. The work will be of broad interest to researchers working on memory.

    2. Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and was it paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimibine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Fig. 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript:

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      This weakness was satisfactorily addressed in one revision round. As RT data are often not normally distributed, were they transformed prior to entry into linear models?

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      Minor comments<br /> - Related to Major issue 2. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      This weakness was addressed in one revision round, resulting in an excellent introduction, highlighting the importance of studying post-retrieval effects for memory researchers and healthcare workers.

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths for this well-written manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      How reconsolidation works - particularly in humans - remains largely unknown. With an elegant, 3-day design, combining fMRI and psychopharmacology, the authors provide evidence for a certain role for noradrenaline in the reconsolidation of memory for neutral stimuli. All memory tasks were performed in the context of fMRI scanning, with additional resting-state acquisitions performed before and after recall testing on Day 2. On Day 1, 3 groups of healthy participants encoded word-picture associates (with pictures being either scenes or objects) and then performed an immediate cued recall task to presentation of the word (answering is the word old or new, and whether it was paired with a scene or an object). On Day 2, the cued recall task was repeated using half of the stimulus set words encoded on Day 1 (only old words were presented, with subjects required to indicate prior scene vs object pairing). This test was immediately preceded by the oral administration of placebo, cortisol, or yohimbine (to raise noradrenaline levels) depending on group assignment. On Day 3, all words presented on Day 1 were presented. As expected, on Day 3, memory was significantly enhanced for associations that were cued and successfully retrieved on Day 2 compared to uncued associations. However, for associative d', there was no Cued × Group interaction nor a main effect of Group, i.e., on the standard measure of memory performance, post-retrieval drug presence on Day 2 did not affect memory reconsolidation. As further evidence for a null result, fMRI univariate analyses showed no Cued × Group interactions in whole-brain or ROI activity.

      Strengths:

      There are some aspects of this study that I find impressive. The study is well-designed and the fMRI analysis methodology is innovative and sound. The authors have made meticulous and thorough physiological measurements, and assays of mood, throughout the experiment. By doing so, they have overcome, to a considerable extent, the difficulties inherent in the timing of human oral drug delivery in reconsolidation tasks, where it is difficult to have the drug present in the immediate recall period without affecting recall itself. This is beautifully shown in Figure 3. I also think that having some neurobiological assay of memory reactivation when studying reconsolidation in humans is critical, and the authors provide this. While multi-voxel patterns of hemodynamic responses are, in my view, very difficult to equate with an "engram", these patterns do have something to do with memory.

      We thank the reviewer for considering aspects of our work impressive, the study to be well-designed, and the methodology to be innovative and sound.

      Weaknesses:

      I have major issues regarding the behavioral results and the framing of the manuscript.

      (1) To arrive at group differences in memory performance, the authors performed median splitting of Day 3 trials by short and long reaction times during memory cueing on Day 2, as they took this as a putative measure of high/low levels of memory reactivation. Associative category hits on Day 3 showed a Group by Day 2 Reaction time (short, long) interaction, with post-hocs showing (according to the text) worse memory for short Day 2 RTs in the Yohimbine group. These post-hocs should be corrected for multiple comparisons, as the result is not what would be predicted (see point 2). My primary issue here is that we are not given RT data for each group, nor is the median splitting procedure described in the methods. Was this across all groups, or within groups? Are short RTs in the yohimbine group any different from short RTs in the other two groups? Unfortunately, we are not given Day 2 picture category memory levels or reaction times for each group. This is relevant because (as given in Supplemental Table S1) memory performance (d´) for the Yohimbine group on Day 1 immediate testing is (roughly speaking) 20% lower than the other 2 groups (independently of whether the pairs will be presented again the following day). I appreciate that this is not significant in a group x performance ANOVA but how does this relate to later memory performance? What were the group-specific RTs on Day 1? So, before the reader goes into the fMRI results, there are questions regarding the supposed drug-induced changes in behavior. Indeed, in the discussion, there is repeated mention of subsequent memory impairment produced by yohimbine but the nature of the impairment is not clear.

      Thank you for the opportunity to clarify these important issues.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose of differentiating between particularly strong memory evidence (e.g., in associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement (Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer 1’s comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58-60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      With respect to behavioral data reporting, we agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as fast vs. slow reaction time trials during Day 2 memory cueing. We conducted this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times. This approach also results in an equal number of trials in the low and high confidence conditions.”

      We completely agree that the relevant post-hoc test should be corrected for multiple comparisons. Please note that all reported post-hoc tests had been Bonferroni-corrected already. We clarify this now by explicitly referring to corrected p-values (P<sub>corr</sub>) and indicate in the methods that P<sub>corr</sub> refers to Bonferroni-corrected p-values. (please see page 25, lines 1036 to 1038).

      We further agree that for a comprehensive overview of the behaviour in terms of memory performance and RTs, these data need to be provided for each group and experimental day. Therefore, we now extended Supplementary Table S1 to include descriptive indices of memory performance (hits, dprime) and RTs for each group for each day. Moreover, we now report ANOVAs for reaction times for each of the experimental days in the main text.

      The ANOVA for Day 1 is now reported on page 6, lines 200 to 204: “To test for potential group differences in reaction times for correctly remembered associations on Day 1, we fit a linear model including the factors Group and Cueing. Critically, we did not observe a significant Group x Cueing interaction, suggesting no RT difference between groups for later cued and not cued items (F(2,58) = 1.41, P = .258, η<sup>2</sup> = 0.01; Supplemental Table S1).”

      The ANOVA for Day 2 is now reported on page 7, lines 243 to 248: “To test for potential group differences in reaction times for correctly remembered associations on Day 2, we fit a linear model including the factors Group and Reaction time (slow/fast) following the subject specific median split. The model did not reveal any main effect or interaction including the factor Group (all Ps > .535; Supplemental Table S1), indicating that there was no RT difference between groups, nor between low and high RT trials in the groups.”

      The ANOVA for Day 3 is reported on page 13 lines 487 to 494: “To test for potential group differences in reaction times for correctly remembered associations on Day 3 we fit a linear model including the factors Group and Cueing. This model did not reveal any main effect or interaction including the factor Group (all Ps > .267), indicating that there was no average RT difference between groups. As expected we observed a main effect of the factor Cueing, indicating a significant difference of reaction times across groups between trials that were successfully cued and those not cued on Day 2 (F(2,58) = 153.07, P < .001, η<sup>2</sup> = 0.22; Supplemental Table S1).”

      (2) The authors should be clearer as to what their original hypotheses were, and why they did the experiment. Despite being a complex literature, I would have thought the hypotheses would be reconsolidation impairment by cortisol and enhancement by yohimbine. Here it is relevant to point out that - only when the reader gets to the Methods section - there is mention of a paper published by this group in 2024. In this publication, the authors used the same study design but administered a stress manipulation after Day 2 cued recall, instead of a pharmacological one. They did not find a difference in associative hit rate between stress and control groups, but - similar to the current manuscript - reported that post-retrieval stress disrupts subsequent remembering (Day 3 performance) depending on neural memory reinstatement during reactivation (specifically driven by the hippocampus and its correlation with neocortical areas).

      Instead of using these results, and other human studies, to motivate the current work, reference is made to a recent animal study: Line 169 "Building on recent findings in rodents (Khalaf et al. 2018), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval". It is difficult to follow that a rodent study using contextual fear conditioning and examining single neuron activity to remote fear recall and extinction would be relevant enough to motivate a hypothesis for a human psychopharmacological study on emotionally neutral paired associates.

      We agree that our recent publication utilizing a very similar experimental design including three days is highly relevant in the context of the current study and we now refer to this recent study earlier in our manuscript. Please see page 3, lines 89 to 94:  

      “Recently, we showed a detrimental impact of post-retrieval stress on subsequent memory that was contingent upon reinstatement dynamics in the Hippocampus, VTC and PCC during memory reactivation26. While this study provided initial insights into the potential brain mechanisms involved in the effects of post-retrieval stress on subsequent memory, the underlying neuroendocrine mechanisms remained elusive.”

      Moreover, we explicitly state our hypothesis regarding the neural mechanism, with reference to our recent work, on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      Concerning the potential direction of the effects of post-retrieval cortisol and noradrenaline, the literature is indeed mixed with partially contradicting results, which made it, in our view, difficult to derive a clear hypothesis of potentially opposite effects of cortisol and yohimbine. We summarize the relevant evidence in the introduction on pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders(5,11,12). In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15). Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #1 (Recommendations for the authors):

      (1) Related to major issue 2 in the Public Review. In the introduction, it would be helpful to be specific about the type of memory being probed in the different studies referenced (episodic vs conditioning). For the former, please make it clear whether stimuli to be remembered were emotional or neutral, and for which stimulus class drug effects were observed. This is particularly important given that in the first paragraph, you describe memory reactivation in the context of traumatic memories via mention of PTSD. It would also be helpful to know to which species you refer. For example, in line 115, "timing of drug administration..." a rodent and a human study are cited.

      We completely agree that these aspects are important. We have therefore rewritten the corresponding paragraph in the introduction to clarify the type of memory probed, the emotionality of the stimuli and the species tested. Please see pages 3 to 4, lines 100 to 113:

      “Some studies, using emotional recognition memory or fear conditioning in healthy humans, suggest enhancing effects of post-retrieval glucocorticoids on subsequent memory(30,31). However, rodent studies on neutral recognition memory(21), fear conditioning(32), as well as evidence from humans on episodic recognition memory(33) report impairing effects of glucocorticoid receptor activation on post-retrieval memory dynamics. For noradrenaline, post-retrieval blockade of noradrenergic activity impairs putative reconsolidation or future memory accessibility in human fear conditioning(34), as well as drug (alcohol) memory(35) and spatial memory in rodents(36). However, this effect is not consistently observed in human studies on fear conditioning(40), speaking anxiety(37), inhibitory avoidance(39), traumatic mental imagination (PTSD patients)(38), and might depend on the arousal state of the individual(21) or the exact timing of drug administration as suggested by studies in humans(41) and rodents(42). Thus, while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.”

      (2) The Bos 2014 reference appears incorrect. I think you mean the Frontiers paper of the same year.

      Thank you for noticing this mistake, which has been corrected.

      (3) Line 734 "The study employed a fully crossed, placebo-controlled, double-blind, between-subjects design". What is a fully crossed design?

      A fully-crossed design refers to studies in which all possible combinations of multiple between-subjects factors are implemented. However, because the factor reactivation/cueing was manipulated within-subject in the present study and there is only one between-subjects factor (group/drug), “fully-crossed” may be misleading here. We removed it from the manuscript.

      (4) Supplemental Table S3. Are these ordered in terms of significance? A t- or Z-value for each cluster (either of the peak or a summed value) would be helpful.

      We agree that the ordering of the clusters was not clearly described. In the revised Supplemental Table S3, we have now added a column with the cluster-peak specific T-values and added an explanation in the table caption: “Depicted clusters are ordered by cluster-peak T-values.”

      (5) Please provide the requested memory performance and reaction time data, and relevant group comparisons.

      In response to general comment #1 above, we now provide all relevant accuracy and reaction time data for all groups and experimental days in the revised Supplemental Table S1. Moreover, we now report the relevant group comparisons in the main text on page 6, lines 200 to 204, on page 7, lines 243 to 248, and on page 13, lines 487 to 494.

      (6) Please rewrite the introduction with specific hypotheses, mention your recent results published in Science Advances, and attend to suggestions made in the first comment above.

      We have rewritten parts of the introduction to make the link to our recent publication clearer and to clarify the types of memories and species tested, as suggested by the reviewer (please see pages 3 to 4, lines 100 to 113). Moreover, we explicitly state our hypothesis regarding the neural mechanism on page 5, lines 166 to 169:

      “Building on our recent findings in humans(26) as well as current insights from rodents(47), we hypothesized that the effects of post-retrieval noradrenergic and glucocorticoid activation would critically depend on the reinstatement of the neural event representation during retrieval.”

      In terms of the direction of the potential cortisol and yohimbine effects, we have elaborated on the relevant literature, which in our view does not allow a clear prediction regarding the nature of the drug effects. We have made this explicit by stating that “… while there is evidence that glucocorticoid and noradrenergic activation after retrieval can affect subsequent memory, the direction of these effects remains elusive.” (please see page 4, lines 111 to 113). It would be, in our view, inappropriate to retrospectively add another, more specific “hypothesis”.

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how noradrenergic and glucocorticoid activity after retrieval influence subsequent memory recall with a 24-hour interval, by using a controlled three-day fMRI study involving pharmacological manipulation. They found that noradrenergic activity after retrieval selectively impairs subsequent memory recall, depending on hippocampal and cortical reactivation during retrieval.

      Overall, there are several significant strengths of this well-written manuscript.

      Strengths:

      (1) The study is methodologically rigorous, employing a well-structured three-day experimental design that includes fMRI imaging, pharmacological interventions, and controlled memory tests.

      (2) The use of pharmacological agents (i.e., hydrocortisone and yohimbine) to manipulate glucocorticoid and noradrenergic activity is a significant strength.

      (3) The clear distinction between online and offline neural reactivation using MVPA and RSA approaches provides valuable insights into how memory dynamics are influenced by noradrenergic and glucocorticoid activity distinctly.

      We thank the reviewer for these very positive and encouraging remarks.

      Weaknesses:

      (1) One potential limitation is the reliance on distinct pharmacodynamics of hydrocortisone and yohimbine, which may complicate the interpretation of the results.

      We agree that the pharmacodynamics of hydrocortisone and yohimbine are different. However, we took these pharmacodynamics into account when designing the experiment and have made an effort to accurately track the indicators for noradrenergic arousal and glucocorticoids across the experiment. As shown in Figure 2, these indicators confirm that both drugs are active within the time window of approximately 40-90 minutes after reactivation. This time window corresponds to the proposed reconsolidation window, which is assumed to open around 10 minutes post-reactivation and to remain open for a few hours (approximately 90 minutes; Monfils & Holmes, 2018; Lee et al., 2017; Monfils et al., 2009).

      We have now acknowledged the distinct pharmacodynamics of hydrocortisone and yohimbine on page 21, lines 845 to 847: “We note that yohimbine and hydrocortisone follow distinct pharmacodynamics(104,105), yet selected the administration timing to ensure that both substances are active within the relevant post-retrieval time window.”

      In the results section, on page 11, lines 437 to 439, we further emphasize this differential dynamic: “Our data demonstrate that, despite the distinct pharmacodynamics of CORT and YOH, both substances are active within the time window that is critical for potential reconsolidation effects(3,4,43).”

      (2) Another point related above, individual differences in pharmacological responses, physiological and cortisol measures may contribute to memory recall on Day 3.

      The administered drugs elicit a pronounced adrenergic and glucocorticoid response, respectively. Specifically, the cortisol levels reached by 20mg of hydrocortisone correspond to those observed after a significant stressor exposure. Moreover, individual variation in stress system activation following drug intake tends to be less pronounced than in response to a natural stressor. Nevertheless, we fully agree that individual factors, such as metabolism or body weight, can influence the drug's action.

      We therefore re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      Although we acknowledge that our study may not have been sufficiently powered for an analysis of individual differences, these data suggest that our memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses. It is to be noted, however, that all participants of the respective groups showed a pronounced increase in cortisol concentrations (on average > 1000% in the CORT group) and autonomic arousal (on average > 10% in the YOH group), respectively. These increases appeared to be sufficient to drive the observed memory effects, irrespective of some individual variation in the magnitude of the response.

      (3) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement  Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      We agree that the critical median-split procedure was not sufficiently clear in the original manuscript. We elaborate on this important aspect of the analysis now on page 26, lines 1053 to 1057:

      “We conducted a median-split within each participant to categorize trials as slow vs. fast reaction time trials during Day 2 memory cueing. We chose to conduct this split on the participant- and not group-level because there is substantial inter-individual variability in overall reaction times and to retain an equal number of trials in the low and high confidence conditions.”

      In addition to these reviewer comments and in response to the eLife assessment, we would like to emphasize that the present findings are in our view not only relevant for a subfield but may be of considerable interest for researchers from various fields, beyond experimental memory research, including Neurobiology, Psychiatry, Clinical Psychology, Educational Psychology, or Law Psychology. We highlight the relevance of the topic and our findings now more explicitly in the introduction and discussion. Please see page 3:

      “The dynamics of memory after retrieval, whether through reconsolidation of the original trace or interference with retrieval-related traces, have fundamental implications for educational settings, eyewitness testimony, or mental disorders5,11,12. In clinical contexts, post-retrieval changes of memory might offer a unique opportunity to retrospectively modify or render less accessible unwanted memories, such as those associated with posttraumatic stress disorder (PTSD) or anxiety disorders(13–15). Given these potential far reaching implications, understanding the mechanisms underlying post-retrieval dynamics of memory is essential.”

      On page 17:

      “Upon their retrieval, memories can become sensitive to modification(1,2). Such post-retrieval changes in memory may be fundamental for adaptation to volatile environments and have critical implications for eyewitness testimony, clinical or educational contexts(5,11–15), Yet, the brain mechanisms involved in the dynamics of memory after retrieval are largely unknown, especially in humans.”

      And on page 19:

      “Beyond their theoretical relevance, these findings may have relevant implications for attempts to employ post-retrieval manipulations to modify unwanted memories in anxiety disorders or PTSD(97,98). Specifically, the present findings suggest that such interventions may be particularly promising if combined with cognitive or brain stimulation techniques ensuring a sufficient memory reactivation.“

      Reviewer #2 (Recommendations for the authors):

      My comments and/or questions for the authors to improve this well-written manuscript.

      (1) This study identifies the modulatory role of the hippocampus and VTC in the effects of norepinephrine on subsequent memory. Are there functional interactions between these ROIs and other brain regions that could be wise to consider for a more comprehensive understanding of the underlying neural mechanisms?

      We agree that functional interactions of hippocampus and VTC and other regions that were active during Day 2 memory cueing are relevant for our understanding of the underlying mechanisms. We therefore now performed connectivity analyses using general psycho-physiological interaction analysis (gPPI; as implemented in SPM) and report the results of this analysis on page 16, lines 635 to 644, and added Supplemental Table S4 including gPPI statistics.

      “We conducted general psycho-physiological interaction analysis (gPPI) analyses on the Day 2 memory cueing task (remembered – forgotten), which revealed that successful cueing was accompanied by significant functional connectivity between the left hippocampus, VTC, PCC and MPFC (see Supplemental Table S4). However, using these connectivity estimates to predict Day 3 subsequent memory performance (dprime) via regression did not reveal any significant Group × Connectivity interactions, indicating that the pharmacological manipulation (i.e. noradrenergic stimulation) did not modulate subsequent memory based on functional connectivity during memory cueing (all P<sub>Corr</sub> > .228). The same pattern of results was observed when including single trial beta estimates from multiple ROIs during memory cueing to predict Day 3 memory (all interaction effects P<sub>Corr</sub> > .288).”

      (2) In theory, noradrenergic activity would have a profound impact on activity in widespread brain regions that are closely related to memory function. It would be interesting to know other possible effects beyond the hippocampus and VTC.

      We agree and included in our analysis additional ROIs beyond the HC and VTC; we now report these explorative results on page 16, lines 616 to 633:

      “Beyond hippocampal and VTC activity during memory cueing (Day 2), we exploratively reanalysed the GLMMs predicting Day 3 memory performance including the PCC, which was relevant during memory cueing in the current study and in our previous work(26).  Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the PCC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1). We also included the Medial Prefrontal Cortex (MPFC) to predict Day 3 memory performance, as the MPFC has been shown to be sensitive to noradrenergic modulation in previous work(75). Predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing in the MPFC did not reveal a significant interaction (P<sub>Corr</sub>  = 1); adding the factor Reaction time to the model also did not result in a significant interaction (P<sub>Corr</sub> = 1), which indicates that the MPFC was not modulated by either pharmacological intervention. Finally, we investigated memory cueing from all remaining ROIs that were significantly activated during the Day 2 memory cueing task (Day 2 whole-brain analysis; correct-incorrect; Supplemental Table S3). We again fit GLMMs predicting Day 3 memory performance by the factors Group and Single trial beta activity during memory cueing. Again, we did not observe any significant interaction effect any of the ROIs (all interaction P<sub>Corr</sub> > .060) and these results did not change when adding the factor Reaction time to the respective models (all  P<sub>Corr</sub> > .075).”

      (3) There are substantial individual differences in pharmacological responses, physiological and cortisol measures, as shown in Figure 3A&B. If such individual differences are taken into account, are there any potential effects on subsequent recall on Day 3 pertaining to the hydrocortisone group?

      In response to this comment (and the General comment #1 of this reviewer), we now re-analyzed the respective models including individual measures of baseline-to-peak cortisol and systolic blood pressure.

      We re-analysed the reported Day 3 models, now including individual measures of baseline-to-peak changes in cortisol and systolic blood pressure, respectively. We report these additional analyses in the supplement and refer the interested reader to these analyses on page 15, lines 580 to 586:

      “As individual factors, such as metabolism or body weight, can influence the drug's action, we ran an additional analysis in which we included individual (baseline-to-peak) differences in salivary cortisol and (systolic) blood pressure, respectively. This analysis did not show any group by baseline-to-peak difference interaction suggesting that the observed memory effects were mainly driven by the pharmacological intervention group per se and less by individual variation in responses to the drug (see Supplemental Results).”

      And in the Supplemental Results:

      “To account for individual differences in cortisol responses after pill intake, we fit additional GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak cortisol and Group. Doing so allowed us to account for variation in Day 3 performance, which might have resulted from within-group variation in cortisol responses, in particular in the CORT group. Importantly, none of the models predicting Day 3 memory performance by Day 2 cortisol-increase and Group, median-split RTs (high/low), hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement revealed a significant group x baseline-to-peak cortisol interaction (all Ps > .122). These results suggest that inter-individual differences in cortisol responses did not have a significant impact on subsequent memory, beyond the influence of group per se. The same analyses were repeated for systolic blood pressure employing GLMMs predicting Day 3 subsequent memory of cued and correct trials including the factors Individual baseline-to-peak systolic blood pressure and Group to account for variation in Day 3 performance, which might have resulted from within-group variation in blood pressure response, in particular in the YOH group. While the model predicting Day 3 memory performance revealed a significant Individual baseline-to-peak systolic blood pressure × Group × median-split RTs (high/low) interaction (β = -0.05 ± 0.02, z = -2.04, P = .041, R<sup>2</sup><sub>conditional</sub> = 0.01), post-hoc slope tests, however, did not show any significant difference between groups (all P<sub>Corr</sub> > .329). The remaining models including hippocampal activity and RTs, or hippocampal activity and VTC category reinstatement did not reveal a significant Group × Individual baseline-to-peak systolic blood pressure interaction (all Ps > .101). These results suggest that inter-individual differences in systolic blood pressure responses did not have a significant impact on subsequent memory, beyond the influence of group per se.”

      (4) Median-splitting approach for reaction times and hippocampal activity should better be justified.

      Reaction times are well established proxies (correlates) of memory strength and memory confidence in previous research, as they reflect cognitive processes involved in retrieving information. Faster reaction times indicate stronger mnemonic evidence and higher confidence in the accuracy of a memory decision, while slower responses suggest weaker evidence and decision uncertainty or doubt. This relationship is supported by an extensive literature (e.g., Starns 2021; Robinson et al., 1997; Ratcliff & Murdock, 1976; amongst others). Importantly, distinguishing between high and low confidence choices in a memory task serves the purpose to differentiating between particularly strong memory evidence (e.g., is associative cued recall, when remembering is particularly vivid) and weaker memory evidence. Separating low from high confidence responses based on participants’ reaction times was especially important in the current analyses, because previous research demonstrates that reaction times during cued recall tasks inversely correlate with hippocampal involvement ( Heinbockel et al., 2024; Gagnon et al. 2019) and that stress-effects on human memory may be particularly pronounced for high-confidence memories (Gagnon et al., 2019).

      In response to the Reviewer comments, we have elaborated on our rationale for the distinction between short and long reaction times in the introduction, results, and methods. Please see page 4, lines 144 to 148:

      “We distinguished between responses with short and long reaction times indicative of high and low confidence responses because previous research showed that reaction times are inversely correlated with hippocampal memory involvement(58–60) and memory strength(61,62), and that high confidence memories associated with short reaction times may be particularly sensitive to stress effects(63).”

      On page 13, lines 520 to 523:

      “Reaction times in the Day 2 Memory cueing task revealed a trial-specific gradient in reactivation strength. Thus, we turned to single-trial analyses, differentiating Day 3 trials by short and long reaction times during memory cueing on Day 2 (median split), indicative of high vs. low memory confidence(58–60) and hippocampal reactivation(26,63).”

      And on page 26, lines 1046 to 1053:

      “Reaction times serve as a proxy for memory confidence and memory strength, with faster responses reflecting higher confidence/strength and slower responses suggesting greater uncertainty/weaker memory. The association between reaction times and memory confidence has been established by previous research(58–60), suggesting that the distinction between high from low confidence responses differentiates vividly recalled associations from decisions based on weaker memory evidence. Reaction times are further linked to hippocampal activity during recall tasks(26,53), and stress effects on memory are particularly pronounced for high-confidence memories(53).”

      Minor comments:

      (5) Please include the full names of key abbreviations in the figure legends, such as "ass.cat.hit" and among others.

      We now include the full names of key abbreviations in all figure legends (e.g., ass.cat.hit = associative category hit).

      (6) Please introduce various metrics used in the study to aid readers in better understanding the measurements they utilized.

      We agree that various measures that were included in our analyses had not been described clearly enough before, especially concerning the multivariate analyses. We therefore added short explanations across the results section.

      Page 8, lines 279 to 280: “Classifier accuracy is derived from the sum of correct predictions the trained classifier made in the test-set, relative to the total amount of predictions.”

      Page 8, lines 290 to 292:  “Neural reinstatement reflects the extent to which a neural activity pattern (i.e., for objects) that was present during encoding is reactivated during retrieval (e.g., memory cueing).”

      Page 8, lines 299 to 301:  “The logits here reflect the log-transformed trial-wise probability of a pattern either representing a scene or an object.”

      Page 10, lines 378 to 380:  “Beyond category-level reinstatement, we assessed event-level memory trace reinstatement from initial encoding (Day 1) to memory cueing (Day 2), via RSA, correlating neural patterns in each region (hippocampus, VTC, and PCC) across days.”

      (7) Please explain what the different colors represent in Figures 5B and 5C to avoid confusion. It would be good to indicate significant differences in the figures if applicable.

      We now added line legends to the figure and also the caption to clarify what exactly is depicted. We added asterisks to mark significant differences.

      References:

      Monfils, M. H., Cowansage, K. K., Klann, E., & LeDoux, J. E. (2009). Extinction-reconsolidation boundaries: key to persistent attenuation of fear memories. science324(5929), 951-955.

      Monfils, M. H., & Holmes, E. A. (2018). Memory boundaries: opening a window inspired by reconsolidation to treat anxiety, trauma-related, and addiction disorders. The Lancet Psychiatry5(12), 1032-1042.

      Lee, J. L. C., Nader, K. & Schiller, D. An Update on Memory Reconsolidation Updating. Trends Cogn. Sci. 21, 531–545 (2017).

      Radley, J. J., Williams, B., & Sawchenko, P. E. (2008). Noradrenergic innervation of the dorsal medial prefrontal cortex modulates hypothalamo-pituitary-adrenal responses to acute emotional stress. Journal of Neuroscience28(22), 5806-5816.

      Heinbockel, H., Wagner, A. D., & Schwabe, L. (2024). Post-retrieval stress impairs subsequent memory depending on hippocampal memory trace reinstatement during reactivation. Science Advances10(18), eadm7504.

    1. eLife Assessment

      This study provides compelling evidence for functional subpopulations of β-cells responsible for Ca2+ signal initiation and maintenance using novel three-dimensional light sheet microscopy imaging and analysis of pancreatic islets. The findings are important as they help decode mechanistic underpinnings of islet calcium oscillations and the resulting pulsatile insulin secretion. The work will be of general interest to cell biologists and particular interest to islet biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Jin, Briggs and colleagues use light sheet imaging to reconstruct the islet three-dimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable, and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability and role of such subpopulations in islet function.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognised and evaluated/discussed.

      Comments on revisions:

      The authors have addressed the majority of the points raised.

    3. Reviewer #2 (Public review):

      The manuscript by Erli Jin and Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high quality data, and the conclusions are largely supported by the results.

      Comments on revisions:

      The manuscript by Erli Jin et al. has been improved with the revisions, which have addressed my previous concerns. The manuscript significantly improves the mechanistic underpinnings of islet calcium oscillations and resulting pulsatile insulin secretion.

    4. Reviewer #3 (Public review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on a speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101). This has not been a surprise to the reviewer.

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with subpopulation of beta cells with a high probability of leading the islet activity and which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      Weaknesses:

      There are a few relevant issues that need to be addressed.

      (1) The study is not internally consistent regarding the Results section. In the text the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263... It is recommended that the silent and active phase in the Results section describe processes actually measured in this study as shown 6A.

      (2) There are in fact no radially oriented networks in the core of an islet (l. 130, Fig. 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      (3) The study would profit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found associated to active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      (4) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Jin, Briggs, and colleagues use light sheet imaging to reconstruct the islet threedimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability, and role of such subpopulations in islet function.

      We thank the reviewers for their positive assessment.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognized and evaluated/discussed.

      We worked very hard to make sure the islets remained stable and healthy over the duration of imaging time course. We imaged the islet in 3D and observed that all betacells displayed glucose-dependent oscillations, which can only arise from functioning cells. From the raw calcium traces (displayed in the figures) we observed no detectable loss of signal over 60 min of continuous imaging regardless of drug treatment; this is because the laser excitation is below the bleach threshold for GCaMP6s, and it is bleaching that generates phototoxicity. To demonstrate this clearly, we performed a bleach test using 6x laser power; in this case calcium amplitude dropped 30% over a 60 min of imaging, however islet calcium oscillatory behavior was preserved. Light-sheet is well documented to be 1000x more gentle than other optical sectioning techniques, which is why it was chosen for this application.

      Regarding the limitations of imaging approach, we recognized studying islets ex vivo is necessarily performed in the absence of native surrounding tissue, as highlighted in the discussion.

      Reviewer #2 (Public Review):

      The manuscript by Erli Jin, Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high-quality data, and the conclusions are largely supported by the results.

      It has recently been shown that beta cells within islets containing intact vasculature (such as those in a pancreatic slice) show different calcium responses compared to isolated islets (such as that shown in PMID: 35559734). It would be important to include some discussion about the potential in vitro artifacts in calcium that arise following islet isolation (this could be included in the discussion about the limitations of the study).

      Although isolated islets reproduce the slow oscillatory calcium behavior observed in vivo, we agree that missing elements such as blood flow, cholinergic innervation, and surrounding tissues may each impact islet calcium responses. Pancreatic regional blood flow also links the endocrine and exocrine signaling which can directly influence the behavior of beta cells. We have highlighted some of these issues in the discussion “In addition to α-cells, vasculature may also impact islet Ca2+ responses, and may induce additional heterogeneity in vivo.” (see line 375, Ref. 46).

      Reviewer #3 (Public Review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on the speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101).

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with a subpopulation of beta cells with a high probability of leading the islet activity which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Some issues concerning inconsistencies between data presented and statements made as well as statistical analysis need to be addressed.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      We thank the reviewers for the comments on the many strengths of the manuscript and address the specific critiques below.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Essential revisions:

      (1) How useful is GK activation as a subpopulation-level perturbation, given that all beta cells would be affected? Previous studies by the authors have shown that GK gradients likely dictate subpopulation behaviour, so the concern here is that GK activation across all cells might mask the influence of such gradients i.e. a U-shaped effect. Also, does the GK activator differentially penetrate the islet such that first responders/leaders are more vulnerable than hubs?

      As we previously published, non-saturating concentrations of GK activator (as used here) have the same effect on calcium oscillations as raising glucose (PMID:33147484). In other words, the activator boosts the activity of the endogenous GK. To the second point, recent ex vivo islet studies (PMID: 28380380) document the islet penetration of a fluorescent glucose analogue within seconds even under static conditions, and in our study the islets calcium oscillations reached steady state, so we are not concerned about drug penetration. The real limitation with any drug study in the islet is that non-beta cells are also activated; this limitation is included in the discussion along with the recommendation that genetic tools are needed to assess the effect of GK activation in the various endocrine subpopulations. 

      An additional concern with the GK activation experiment is that GK activation might push beta cells into a more stressed state such that they are more susceptible to phototoxicity. Although the authors state that photobleaching is low, they provide no data to support such a statement. Given the long duration of imaging and acquisition rate, phototoxicity might be more of an issue, especially with GK activation. Some further analysis (e.g. apoptosis) would be useful here to exclude an effect of beta cell viability versus GK activation on the observed phenotype of the different subpopulations.

      Acute GK activation (for 30min) does not stress the islet; the drug has the same effect as raising glucose (PMID: 33147484). To determine whether photobleaching was impacted by GK activation, we examined the peak of consecutive oscillations in response to vehicle and GK activator. The average photobleaching was less than 2% of the calcium fluorescence over 30min of continuous imaging. Furthermore, GKa activation did not significantly increase photobleaching (see Author response image 1). 

      Author response image 1.

      To the reviewer’s second point, apoptosis cannot occur on the timescale of the drug treatment (30min), and raw calcium traces are included showing that all beta cells display oscillatory behavior throughout the course of the experiment.

      (2) The authors show that glucokinase activation increases the duration of islet calcium oscillations and in some islets (3 of 15 islets) causes "a Ca2+ plateau." The authors indicate that "Glucokinase, as the 'glucose sensor' for the β-cell, controls the input of glucose carbons into glycolysis, and opens KATP channels." It would be nice to have some experimental evidence that the change in oscillation rate caused by the glucokinase activator is due to KATP activation. This could be accomplished by treating islets with subthreshold KATP activators (e.g., diazoxide) or subthreshold KATP inhibitors (e.g., tolbutamide).

      The statement that glucokinase activation opens KATP channels was a typo; glucose metabolism closes KATP channels by raising the ATP/ADP ratio. We now include additional citations that document the relationship between GK and KATP and the oscillatory behavior. See Ref 22 (PMID: 33147484) and Ref 34 (PMID: 33147484).

      The manuscript finds that "Early phase cells were maintained to a greater degree upon GKa application." Yet GKa is proposed to activate KATP. Some discussion about how the early phase is maintained in cell populations by GKa activation in the context of KATP activity would be useful.

      As discussed above, we meant to say that GKa will close KATP and apologize for the confusion. As we mentioned in the discussion, early phase cells are most likely maintained to a great degree following GK activation as result of enhanced GK gradient and reduced effect of stochastic alpha cell input. 

      (3) Membrane potential depolarization precedes calcium channel activation and subsequent calcium entry. In many cases, electrical coupling across beta cells happens on millisecond timescale. It would be good to confirm that the calcium is showing the same time scale in terms of elevation following beta cell membrane potential depolarization. One concern is that the islet beta cells could be depolarizing at the same speed and lagging in terms of calcium channel activation and calcium entry.

      We thank the reviewer for making this point, which is almost certainly true, particularly since plasma membrane calcium influx is not the sole source of intracellular calcium. Previously published “simultaneous” recordings of Vm and calcium show their same phase relationship but do not have sufficient time resolution to capture depolarization of each cell. A quantification of phase lag would require the field to generate mice with voltage sensors expressed in beta cells; these tools are not yet available.  

      A related issue: in the text, the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263. It is recommended that the silent and active phases in the Results section describe processes actually measured in this study as shown in 6A.

      To clarify, we did not use the term ‘membrane potential’ anywhere in the manuscript. We do sometimes refer to calcium influx as a proxy for membrane depolarization; we think this is valid given the abundant evidence that these processes are interdependent in beta cells.

      (4) It would be good to include the timing of the phases of calcium entry. When was the beta cell calcium entry monitored for the response time? Were the response times between the late and early phases consistent for each oscillation? It looks as if the start of the calcium upstroke was similar for many beta cells (such as for the Figure 2I traces). It would be nice to include a shorter time duration graph of calcium oscillation traces right when the upstroke starts. This would allow the community to observe the differences in the start time of calcium entry. 

      We agree this is an important point. We now include an inset showing the expanded time scale of the calcium upstroke in Fig.2I. The response time spread between early and late phase cells is now shown in Fig.7F (and in Author response image 2). We also quantified the coefficient of variation in the response time spread (0 = no variation and 1 = maximal variation) and found no significant differences between metabolic activators (Author response image 2). 

      Author response image 2.

      Also, for most of the GCaMP6s traces shown, the authors indicate that they are plotted as F/F0. However, this normalization (F/F0) is not done for the actual traces shown. For example, Figure 2D shows the traces starting from what looks to be 0 to 0.3 F/F0, but the traces for an F/F0 group should all start at 1. Please change this for all representative oscillations so the start of calcium entry for example traces all line up.

      This has been corrected in Fig. 2D, I and Fig. 3B. Also Fig.6 should be F not F/F0

      Reviewer #1 (Recommendations for the authors):

      (1) Line 53: "Silencing the electrical activity of these hub cells with optogenetics was found to abolish the coordination within that plane of the islet". The authors should acknowledge that studies also showed that beta cell transcription factor (Pdx1/Mafa) dosage was important for hub cell phenotype and islet function.

      Thank you, this reference to Nasteska et al. (PMID: 33514698, Ref. 16) has been added to the discussion.

      (2) Light sheet imaging is used to image the 3D islet volume. Whilst speed is undoubtedly an advantage of this technique, axial resolution is ~1.1 µm over 4 µm z-step size. How confident are the authors that single nuclei can be reliably identified given their ~6 µm size in a beta cell (e.g. do some elongated nuclear appear, which could be "doublets")?

      The axial resolution of 1.1 µm exceeds the resolution needed for the Nyquist criterion (i.e. sampling every 2-3 µm). As a practical matter, it is not possible to doublecount nuclei because the software will exclude nuclei that occupy the same volume. Only a very elongated nucleus (>10 µm) would be double counted and this does not occur.

      (3) The authors discuss the advantages of the light sheet imaging approach used, including speed and phototoxicity. Some more balance is needed here since other approaches such as two-photon excitation achieve similar speeds with much better axial resolution (see dozens of neural circuit studies).

      We are careful to point out that two-photon excitation has better axial resolution, better tissue penetration, and often higher speeds (kHz using linescans) – however these neuronal studies are limited to the cells in a few planes and the laser power is orders of magnitude higher than lightsheet. For this reason, two photon imaging has not been used to image islet calcium in three dimensions. The bottom line is lightsheet trades axial resolution for gentle volumetric imaging. 

      (4) Line 340: "Laser ablation or optogenetic inactivation of these early phase cells would be predicted to have little impact on islet function, as suggested previously by electrophysiological studies in which surface β-cells have been voltage-clamped with no impact on β-cell oscillations". This statement is slightly ambiguous since the authors showed in their previous studies that laser ablation of first responder cells/leaders was able to influence the Ca2+ network. Do the authors mean that laser ablation would only temporarily influence islet function before another cell picked up the role of a first responder/leader? As written, the sentence seems to imply that first responders/leaders are unimportant for the islet function.

      We intended to imply that the oscillatory system is sufficiently robust that a new cell take over when leader cells are ablated. We also cite Korosak et al. (PMID:34723613, Ref. 40) and Dwulet et al. (PMID: 33939712, Ref. 15) to make this point, although to clarify we are not examining first responders in this study.

      (5) Line 369: "In contrast with leader cells, we found that the highly synchronized cells are both spatially and temporally stable." The sentence needs qualifying- what would spatiotemporal stability be expected to confer on such a subpopulation?

      We believe that the spatiotemporal stability of highly synchronized cells is a consequence of beta cells in the center of the islet lacking the stochastic input of nearby alpha cells; we raise this point in the discussion: “The preponderance of α-cells on the periphery of mouse islets, which influence β-cell oscillation frequency, would be expected to disrupt β-cell synchronization on the periphery and stabilize it in the islet center – which is precisely the pattern of network activity we observed.” (see line 372). 

      (6) Line 370: "However, in conflict with the description of hub cells as intermingled with other cells throughout the islet, the location of such cells in 3D space is close to the center." The study by Johnston et al did not have the axial resolution to exclude that some cells might have been grouped together.

      We agree and have included the reviewer’s comment in the text (See line 384); that’s an important reason for conducting this 3D study.  

      (7) Line 380: "One explanation may be that paracrine communication within the islet determines which region of cells will show high or low degree. For example, more peripheral cells that are in contact with nearby δ-cells may show some suppression in their Ca2+ dynamics, and thus reduced synchronization." A potentially exciting future study. Should however probably cite DOI s41467-022-31373-6 here.

      We thank the reviewer for their input. This reference to Ren et al. (PMID:35764654) was previously included as Ref. 42 (now Ref. 45)

      Reviewer #3 (Recommendations for the authors):

      (1) There are in fact no radially oriented networks in the core of an islet (l. 130, Figure 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      Thank you for pointing out this imprecise language. We did not intend to imply that the functional network is orientated radially. We corrected the text (see line 131, 145) to indicate that the cells with high and low synchronization are distributed in a radial pattern. 

      (2) The study would benefit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found to be associated with active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      The reviewer is absolutely correct that there are amplifying factors and other sources of calcium beyond plasma membrane influx and there are other mechanisms that regulate insulin secretion beyond calcium levels. These alternative mechanisms are introduced in Refs. 1-2, however they are not the focus of this study. 

      (3) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

      We thank the reviewer for their careful observation. This arises because there are larger differences between preparations than within a preparation. This has been described previously (PMID: 16306370 and 20037650) and could be expected to account for the differences in KL divergence between animals. 

      (4) Statistical analysis would profit from testing the normality of the data distribution before choosing the statistical test and then learning the difference between parametric and nonparametric tests. For example, in Figures 3CD and 5EF, the data density is lower at the calculated mean than below and above this value and there are other examples in other figures too.

      We thank the reviewer for this very important comment, and we apologize for the oversight on our part. To address this comment, we conducted two normality tests: Anderson-Darling and Kolmogorov-Smirnov on all statistical analyses in the manuscript. If the data were not normally distributed, we changed the analysis to Wilcoxon matchedpairs signed rank test (non-parametric version of t-tests) or the Friedman test (nonparametric version of ANOVA). Three results were changed based on this statistical correction: Figure 4D, also 5F 3D (from P=0.01 to P=0.0526), Figure 5F  ¼ z-depth (P = 0.005 to P = 0.012). We have updated the manuscript methods, results, and figures accordingly. Importantly, these results did not change the main points of the paper.

    1. eLife Assessment

      This important work presents the development of a novel inhibitor for SARS-CoV-2 Mac1 that has potential utility both as an antiviral therapeutic and as a tool for probing the molecular mechanisms by which infection-induced ADP-ribosylation triggers robust host antiviral responses. The evidence supporting the claims is generally convincing but could be improved if the authors expanded the phenotypic characterization of the compound and its potential effects on both viral and host targets.

    2. Reviewer #1 (Public review):

      SARS-CoV-2 encodes a macrodomain (Mac1) within the nsp3 protein that removes ADP-ribose groups from proteins. However, its role during infection is not well understood. Evidence suggests that Mac1 antagonizes the host interferon response by counteracting the wave of ADP ribosylation that occurs during infection. Indeed, several PARPs are interferon-stimulated genes. While multiple targets have been proposed, the mechanistic links between ADP ribosylation and a robust antiviral response remain unclear.

      Genetic inactivation of Mac1 abrogates viral replication in vivo, suggesting that small-molecule inhibitors of Mac1 could be developed into antivirals to treat COVID-19 and other emerging coronaviruses. The authors report a potent and selective small molecule inhibitor targeting Mac1 (AVI-4206) that demonstrates efficacy in human airway organoids and animal models of SARS-CoV-2 infection. While these results are compelling and provide proof of concept for the therapeutic targeting of Mac1, I am particularly intrigued by the potential of this compound as a probe to elucidate the mechanistic connections between infection-induced ADP ribosylation and the host antiviral response.

      The precise function of Mac1 remains unclear. Given its presence in multiple viruses, it likely acts on a fundamental host immune pathway(s). AVI-4206, while promising as a lead compound for the development of antivirals targeting coronaviruses, could also be a valuable tool for uncovering the function of the Mac1 domain. This may lead to fundamental insights into the host immune response to viral infection.

    3. Reviewer #2 (Public review):

      Summary:

      The authors describe the development of a novel inhibitor (AVI-4206) for the first macrodomains of the nsp3 protein of SARS-CoV-2 (Mac1). This involves both medical chemical synthesis, structural work as well as biochemical characterisation. Subsequently, the authors present their findings of the efficacy of the inhibitor both on cell culture, as well as animal models of SARS-CoV-2 infection. They find that despite high affinity for Mac1 and the known replicatory defects of catalytically inactive Mac1 only moderate beneficial effects can be observed in their chosen models.

      Strengths:

      The authors employ a variety of different assay to study the affinity, selectivity and potency of the novel inhibitor and thus the in vitro data are very compelling.<br /> Similarly, the authors use several cell culture and in vivo models to strengthen their findings.

      Weaknesses:

      (a) The selection of Targ1 and MacroD2 as off-target human macrodomains is poor as several studies have shown that the first macrodomains of PARP9 and PARP14 are much closer related to coronaviral macrodomains and both macrodomains are implicated in antiviral defence and immunity.

      (b) The authors utilize only replication efficiency and general infection markers as read out for their Mac1 inhibitor. It would be good if they could show impact on the ADP-ribosylation of a known Mac1 target such as PARP14.

    4. Reviewer #3 (Public review):

      Summary:

      The authors were trying to validate SARS-CoV-2 Mac1 as a drug discovery target and by extension other viral macrodomains.

      Strengths:

      The medicinal chemistry and structure based optimization is exemplary. Macrodomains and ADPribosyl hydrolases have a reputation for being undruggable, yet the authors managed to optimize hits from a fragment screen using structure based approaches and fragment linking to make a 20nM inhibitor as a tool compound to validate the target.<br /> In addition, the in vivo work is also a strength. The ability to reduce the viral count at a rate comparable to nirmatrelvir is impressive. Tracking the cytokine expression levels also supports much of the genetic data and mechanism of action for macrodomains.

      Weaknesses:

      The main compound AVI-4206, while being very potent and selective is not appreciably orally bioavailable. The fact that they have to use high doses of the compound IP to see in vivo effects may lead to questions regarding off target effects.

      The cellular models are not as predictive of antiviral activity as one would expect. However, the authors had enough chutzpah to test the compound in vivo knowing that cellular models might not be an accurate representation of a living system with a fully functional immune system all of which is most likely needed in an antiviral response to test the importance of Mac1 as a target.

    5. Author response:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for their thoughtful assessment. We especially agree that AVI-4206 will be a valuable tool to help understand the host immune response to viral infection.

      Reviewer #2 (Public review):

      We thank Reviewer #2 for their comments and will address PARP9/14 selectivity with in vitro experiments and alignments/modeling. For ADP-ribosylation of PARP14, we will attempt experiments patterned after Kar et al, EMBO Journal, 2024, but note that detection of ADPr by IF and western has been relatively inconsistent and detection-reagent dependent in our hands. Regardless of the outcome, we will expand the discussion of the prior literature on this point.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their comments, especially noting that we had the “chutzpah” to go for the in vivo experiment. We share the concern about potential off target effects, which is why we prioritized so many selectivity experiments prior to testing. Ongoing chemistry efforts are focused on developing next-generation inhibitors that are orally bioavailable, but this work is in its early stages.

    1. eLife Assessment

      This important work substantially advances our understanding of episodic memory by proposing a biologically plausible mechanism through which hippocampal barcode activity enables efficient memory binding and flexible recall. The evidence supporting the conclusions is convincing, with rigorously validated computational models and alignment with experimental findings. The work will be of broad interest to neuroscientists and computational modelers studying memory and hippocampal function.

    2. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors develop a biologically plausible recurrent neural network model to explain how the hippocampus generates and uses barcode-like activity to support episodic memory. They address key questions raised by recent experimental findings: how barcodes are generated, how they interact with memory content (such as place and seed-related activity), and how the hippocampus balances memory specificity with flexible recall. The authors demonstrate that chaotic dynamics in a recurrent neural network can produce barcodes that reduce memory interference, complement place tuning, and enable context-dependent memory retrieval, while aligning their model with observed hippocampal activity during caching and retrieval in chickadees.

      Strengths:

      (1) The manuscript is well-written and structured.<br /> (2) The paper provides a detailed and biologically plausible mechanism for generating and utilizing barcode activity through chaotic dynamics in a recurrent neural network. This mechanism effectively explains how barcodes reduce memory interference, complement place tuning, and enable flexible, context-dependent recall.<br /> (3) The authors successfully reproduce key experimental findings on hippocampal barcode activity from chickadee studies, including the distinct correlations observed during caching, retrieval, and visits.<br /> (4) Overall, the study addresses a somewhat puzzling question about how memory indices and content signals coexist and interact in the same hippocampal population. By proposing a unified model, it provides significant conceptual clarity.

      Weaknesses:

      The recurrent neural network model incorporates assumptions and mechanisms, such as the modulation of recurrent input strength, whose biological underpinnings remain unclear. The authors acknowledge some of these limitations thoughtfully, offering plausible mechanisms and discussing their implications in depth.

      One thread of questions that authors may want to further explore is related to the chaotic nature of activity that generates barcodes when recurrence is strong. Chaos inherently implies sensitivity to initial conditions and noise, which raises questions about its reliability as a mechanism for producing robust and repeatable barcode signals. How sensitive are the results to noise in both the dynamics and the input signals? Does this sensitivity affect the stability of the generated barcodes and place fields, potentially disrupting their functional roles? Moreover, does the implemented plasticity mitigate some of this chaos, or might it amplify it under certain conditions? Clarifying these aspects could strengthen the argument for the robustness of the proposed mechanism.

      It may also be worth exploring the robustness of the results to certain modeling assumptions. For instance, the choice to run the network for a fixed amount of time and then use the activity at the end for plasticity could be relaxed.

    3. Reviewer #2 (Public review):

      Summary:

      Striking experimental results by Chettih et al 2024 have identified high-dimensional, sparse patterns of activity in the chickadee hippocampus when birds store or retrieve food at a given site. These barcode-like patterns were interpreted as "indexes" allowing the birds to retrieve from memory the locations of stored food.<br /> The present manuscript proposes a recurrent network model that generates such barcode activity and uses it to form attractor-like memories that bind information about location and food. The manuscript then examines the computational role of barcode activity in the model by simulating two behavioral tasks, and by comparing the model with an alternate model in which barcode activity is ablated.

      Strengths of the study:

      - Proposes a potential neural implementation for the indexing theory of episodic memory<br /> - Provides a mechanistic model of striking experimental findings: barcode-like, sparse patterns of activity when birds store a grain at a specific location<br /> - A particularly interesting aspect of the model is that it proposes a mechanism for binding discrete events to a continuous spatial map, and demonstrates the computational advantages of this mechanism

      Weaknesses:

      - The relation between the model and experimentally recorded activity needs some clarification<br /> - The relation with indexing theory could be made more clear<br /> - The importance of different modeling ingredients and dynamical mechanisms could be made more clear<br /> - The paper would be strengthened by focusing on the most essential aspects

    4. Author response:

      We thank the reviewers for the thoughtful comments, and we hope to address these issues in a future revision. We will clarify that chaos only serves to generate barcodes, and show that once they are formed and assigned the memory mechanism is stable to initial conditions.  We will also clarify the model's assumptions and its connections to indexing theory and to experimental results.

    1. eLife Assessment

      This valuable study uses dynamic metabolic models to compare perturbation responses in a bacterial system, analyzing whether they return to their steady state or amplify beyond the initial perturbation. The evidence supporting the emergent properties of perturbed metabolic systems to network topology and sensitivity to specific metabolites is solid, although the authors do not explain the origin of some significant inconsistencies between models.

    2. Reviewer #1 (Public review):

      (1a) Summary:

      The author studied metabolic networks for central metabolism, focusing on how system trajectories returned to their steady state. To quantify the response, systematic perturbation was performed in simulation and the maximal destabilization away from steady state (compared with initial perturbation distance) was characterized. The author analyzed the perturbation response and found that sparse network and networks with more cofactors are more "stable", in the sense that the perturbed trajectories have smaller deviation along the path back to the steady state.

      (1b) Strengths and major contributions:

      The author compared three metabolic models and performed systematic perturbation analysis in simulation. This is the first work characterized how perturbed trajectories deviate from equilibrium in large biochemical systems and illustrated interesting findings about the difference between sparse biological systems and randomly simulated reaction networks.

      (1c) Discussion and impact for the field:

      Metabolic perturbation is an important topic in cell biology and has important clinical implication in pharmacodynamics. The computational analysis in this study provides an initiative for future quantitative analysis on metabolism and homeostasis.

      Comments on revised version:

      The revised version of this manuscript made some clarifications, while I think the analysis of response coefficients is still numerical and model-specific, being unclear under dynamical systems of views.

    3. Reviewer #2 (Public review):

      The authors have conducted a valuable comparative analysis of perturbation responses in three nonlinear kinetic models of E. coli central carbon metabolism found in the literature. They aimed to uncover commonalities and emergent properties in the perturbation responses of bacterial metabolism. They discovered that perturbations in the initial concentrations of specific metabolites, such as adenylate cofactors and pyruvate, significantly affect the maximal deviation of the responses from steady-state values. Furthermore, they explored whether the network connectivity (sparse versus dense connections) influences these perturbation responses. The manuscript is reasonably well written.

      Comments on revised version:

      The authors have addressed my concerns to a large extent. However, a few minor issues remain, as listed below:

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities?

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):  

      First, the metabolic network in this study is incomplete. For example, amino acid synthesis and lipid synthesis are important for biomass and growth, but they4 are not included in the three models used in this study. NADH and NADPH are as important as ATP/ADP/AMP, but they are not included in the models. In the future, a more comprehensive metabolic and biosynthesis model is required.  

      Thank you for the critical comment on the weakness of the present study. We actually tried to study a larger model like Turnborg et al (2021), which is a model of JCVI-syn3A, but we give up to include it in our model list to study in depth. This is because we noticed that the concentration of ATP in the model can be negative (we confirmed this with one of the authors of the paper). Another "big" kinetic model of metabolism that we could list would be Khodayari et al (2017). However, we could not find the models to compare the dynamics of this big model with. Therefore, we decided to use the model only for the central carbon metabolism for now. We would like to leave a more extended study for the near future.  

      We would like to mention that NADH and NADPH are included in Khodayari model and Boecker model, while NADH and NADPH are ramped up to NADH in the latter model.  

      Second, this work does not provide a mathematical explanation of the perturbation response χ. Since the perturbation analysis is performed close to the steady state (or at least belongs to the attractor of single-steady-state), local linear analysis would provide useful information. By complementing with other analysis in dynamical systems (described below) we can gain more logical insights about perturbation response.  

      We tried a linear stability analysis. However, with the perturbation strength we used here, the linearization of the model is no longer valid, in the sense that the linearized model

      leads to negative concentrations of the metabolites (xst+Δx < 0 for some metabolites). We have added a scatter plot of the response coefficient of trajectories sharing the initial condition, while the dynamics are computed by the original model and the linearized model, respectively. (Fig. S1). 

      Since the response coefficient is based on the logarithm of the concentrations, as the metabolite concentrations approach zero, the response coefficient becomes larger. The high response coefficient in the Boecker and Chassagnole model would be explained by this artifact.  The linearized Khodayari model shows either χ~1 or χ = 0 (one or more metabolite concentrations become negative). This could be due to the number of variables in the model. For the response coefficient to have a larger value, the perturbation should be along the eigenvector that leads to oscillatory dynamics with long relaxation time (i.e., the corresponding eigenvalue has a small real part in terms of absolute value and a non-zero imaginary part). However, since the Khodayari model has about 800 variables, if perturbations are along such directions, there is a high probability that one or more metabolite concentrations will become negative.

      We fully agree that if the perturbations on the metabolite concentrations are in the linear regime, the response to the perturbations can be estimated by checking the eigenvalues and eigenvectors. However, we would say that the relationship between the linearized model (and thus the spectrum of eigenvalues) and the original model is unclear in this regime.  We remarked this in Lines 158160.

      Recommendations for the authors:

      My major suggestion is about understanding the key quantity in this study: the response coefficient χ. When the perturbed state is close to the fixed point, one could adopt local stability analysis and consider the linearized system. For a linear system with one stable fixed point P, we consider the Jacobian matrix M on P. If all eigenvalues of M are real and negative, the perturbed trajectory will return to P with each component monotonically varies. If some eigenvalues have negative real part and nonzero imaginary part, then the perturbed trajectory will spiral inward to the fixed point. Depending on the spiral trajectory and the initially perturbed state, some components would deviate furthermore (transiently) from the fixed point on the spiral trajectory. This explains why the response coefficient χ can be greater than 1. 

      Mathematically, a locally linearized system has similar behavior to the linear system, and the examples in this study can be analyzed in the similar way. Specifically, if a system has many complex eigenvalues, then the perturbed trajectory is more likely to have further deviation. The metabolic network models investigated in this work are not extremely large, and hence the author could analyze its spectrum of the Jacobian matrix at the steady state. Since the steady state is stable, I expect the spectrum located in the left half of the complex plane. If the spectrum spread out away from the real axis, we expect to see more spiral trajectories under perturbation. I think the spectrum analysis will provide a complementary view with respect to analysis on χ.  The authors' major findings, about the network sparsity and cofactors, can also be investigated under the framework of the spectrum analysis.  

      Of course, when the nonlinear system is perturbed far away from the fixed point, there are other geometrical properties of the vector field that can cause the response coefficient χ to be greater than 1. This could also be investigated in the future by testing the behavior of small and large perturbations and observing if the systems have signatures of nonlinearity.  

      Since all perturbed states return to the steady state, the eigenvalues of the Jacobi matrix accompanying the linearized system around the steady state are in the left half complex plane (negative real value). Also, some eigenvalues have non-zero imaginary parts.    

      The reason we emphasize the "nonlinear regime" is that the linearization is no longer valid in this regime, i.e. the metabolite concentrations can be negative when we calculate the linearized system. Certainly, there are complex eigenvalues in the Jacobi matrix of any model. However, we would say that there is no clear relationship between the eigenvalues and the response coefficient.      

      Minor suggestions:  

      Line 127: Regarding the source of perturbation, cell division also generates unequal concentration of proteins and metabolites for two daughter cells, and it is an interesting mechanism to create metabolic perturbation. 

      Thank you for the insightful suggestion. We mentioned the cell division as another source of perturbation (Lines 130-131).

      Line 175: I do not quite understand the statement "fixing each metabolite concentration...", since the metabolite concentration in the ODE simulation would change immediately after this fixing.  

      We meant in the sentence that we fixed the concentration of the selected metabolite as the steady state concentration and set the dx/dt of that metabolite to zero. We have rewritten the sentences to avoid confusion (Lines 180-181).

      Figure 2: There are a lot of inconsistencies between the three models. Could we learn which model is more reasonable, or the conclusion here is that the cellular response under perturbation is model-specific? The latter explanation may not be quite satisfactory since we expect the overall cellular property should not be sensitive to the model details. 

      Ideally, the overall cellular property should be insensitive to model details. However, the reality is that the behavior of the models (e.g., steady-state properties, relaxation dynamics, etc.) depends on the specific parameter choices, including what regulation is implemented. I think this situation is part of the motivation for the ensemble modeling (by J. Liao and colleague) that has been developed.  

      Detailed responsiveness would be model specific. For example, FBP has a fairly strong effect in the Boecker model, but less so in the Khodayari model, and the opposite effect in the Chassagnole model (Fig. 2). Our question was whether there are common tendencies among kinetic models that tend to show model-specific behavior.  

      Reviewer 2 (Public Review):

      (1) In the study on determining key metabolites affecting responses to perturbations (starting from line 171), the authors fix the values of individual concentrations to their steady-state values and observe the responses. Such a procedure adds artificial constraints to the network because, in the natural responses of cells (and models) to perturbations, it is highly unlikely that metabolites will not evolve in time. By fixing the values of specific metabolites, the authors prohibit the metabolic network from evolving in the most optimal way to compensate for the perturbation. Instead of this procedure, have the authors considered for this task applying techniques from variance-based sensitivity analysis (Sobol, global sensitivity analysis), where they can calculate the first-order sensitivity index and total effect index? Using this technique, the authors would be able to determine the key metabolites while allowing for metabolic responses to perturbations without unnatural constraints. 

      Thank you for the useful suggestion for studying the roles of each metabolite for responsiveness. We have computed the total sensitivity index (Homma and Salteli, 1996) for each metabolite of each model (Fig.S5). The total sensitivity indices of ATP are high-ranked in Khodayari- and Chassagnole model, while it is middle-ranked in the Boecker model. We believe that the importance of the adenyl cofactors is highlighted also in terms of the Sobol’ sensitivity analysis (the figure is referred in Lines 193-195). 

      We have encountered a minor difficulty for computing the sensitivity index. For the computation of the sensitivity index, we need to carry out the following Monte Carlo integral, 

      where the superscript (m) is the sample number index. The subscript i represents the ith element of the vector x, and ~i represents the vector x except for the ith element. The tilde stands for resampling.  

      There are several conserved quantities in each model. For independent resampling, we need to deal with the conserved quantities. For the Boecker and Chassagnole models, we picked a single metabolite from each conservation law and solved its concentration algebraically to make the metabolite concentration the dependent variable. Then, we can resample the metabolite concentration of one metabolite without changing the concentrations of other metabolites, which are independent variables.  

      However, in the Khodayari model, it was difficult to solve the dependent variables because the model has about 800 variables. Therefore, we gave up the computations of the sensitivity indices of the metabolites whose concentration is part of any conserved quantities, namely NAD, NADH, NADP, NADPH, Q8, and Q8H2.

      (2) To follow up on the previous remark, the authors state that the metabolites that augment the response coefficient when their concentration is fixed tend to be allosteric regulators. The authors should report which allosteric regulations are implemented in each of the models so that one can compare against Figure 2. Again, the effect of allosteric regulation by a specific metabolite that is quantified the way the authors did is biased by fixing the concentration value - it is true that negative feedback is broken when the metabolite concentration is fixed, however, in the rate law, there is still the fixed inhibition term with its value corresponding to the inhibition at the steady state. To see the effect of allosteric regulation by a metabolite, one can change the inhibition constants instead of constraining the responses with fixed concentrations.  

      We have listed the substrate-level regulations (Table S1-3). Also, we re-ran the simulation with reduced the effect of the substrate-level regulations for the reactions that are suspected to influence the change of the response coefficient. Instead of fixing the concentrations (Fig. S6). 

      The impact of substrate-level regulations is discussed in Lines 203-212.   

      We replaced "allosteric regulation" with "substrate-level regulation" because we noticed that some regulations are not necessarily allosteric.

      (3) Given the role of ATP in metabolic processes, the authors' finding of the sensitivity of the three networks' responses to perturbations in the AXP concentrations seems reasonable. However, drawing such firm conclusions from only three models, with each of them built around one steady state and having one kinetic parameter set despite that they were built for different physiologies, raises some questions. It is well-known in studies related to basins of attraction of the steady states that the nonlinear responses also depend on the actual steady states, the values of kinetic parameters, and implemented kinetic rate law, i.e., not only on the topology of the underlying systems. In the population of only three models, we cannot exclude the possibility of overlaps and strong similarities in the values of kinetic parameters, steady states, and enzyme saturations that all affect and might bias the observed responses. Ideally, to eliminate the possibility of such biases, one should simulate responses of a large population of models for multiple physiologies (and the corresponding steady states) and multiple parameter sets per physiology. This can be a difficult task, but having more kinetic models in this work would go a long way toward more convincing results. Recently, E. coli nonlinear kinetic models from several groups appeared that might help in this task, e.g., Haiman et al., PLoS Comput Biol, 17(1): e1008208, (2021), Choudhury et al., Nat Mach Intell, 4, 710-719, (2022); Hu et al., Metab Eng, 82, 123-133 (2024), Narayanan et al., Nat Commun, 15:723, (2024). 

      We have computed the responsiveness of 215 models generated by the MASSpy package (Haiman et al, 2021). Several model realizations showed a strong responsiveness, i.e. a broader distribution of the response coefficient (Fig.S8), and mentioned in Lines 339-341.

      We would like to mention that the three models studied in the present manuscript have limited overlap in terms of kinetic rate law and, accordingly, parameter values. In the Khodayari model, all reactions are bi-uni or uni-uni reactions implemented by mass-action kinetics, while the Boecker and Chassagnole models use the generalized Michaelis-Menten type rate laws. Also, the relationship between the response coefficients of the original model and the linearized model highlights the differences between the models (Fig. S1). If the models were somewhat effectively similar, the scatter plots of the response coefficient of the original- and linearized model should look similar among the three models. However, the three panels show completely different trends. Thus, the three models have less similarity even when they are linearized around the steady states. 

      (4) Can the authors share their insights on what could be the underlying reasons for the bimodal distribution in Figure 1E? Even after adding random reactions, the distribution still has two modes - why is that?  

      We have not yet resolved why only the Khodayari model shows the bimodal distribution of the response coefficient. However, by examining the time courses, the dynamics of the Khodayari model look like those of the excitable systems. This feature may contribute to the bimodal distribution of the response coefficient. In the future, we would like to show whether the system is indeed the excitable system and whcih reactions contribute to such dynamics.

      (5) Considering the effects of the sparsity of the networks on the perturbation responses (from line 223 onwards), when we compare the three analyzed models, it is clear that the Khodayari et al. model is a superset of the other two models. Therefore, this model can be considered as, e.g., Chassagnole model with Nadd reactions (though not randomly added). Based on Figures 1b and S2, one can observe that the responses of the Khodayari models have stronger responses, which is exactly opposite to the authors' conclusion that adding the reactions weakens the responses.

      The authors should comment on this.  

      The sparsity of the network is defined by the ratio of the number of metabolites to the number of reactions. Note that the Khodayari model is a superset of the Boecker and Chassagnole models in terms of the number of reactions, but also in terms of the number of metabolites (Boecker does not have the pentose phosphate pathway, Chassagnole does not have the TCA cycle, and neither has oxyative phosphorylation). Thus, even if we manually add reactions to the Boecker model, for example, we cannot obtain a network that is equivalent to the Khodayari model.  We added one sentence to clarify the point (Lines 254-255).

      Recommendations for the authors: 

      (1) Some typos: Line 57, remove ?; Line 134, correct "relaxation". 

      Thank you for pointing out. We fixed the typos.

      (2) Lines 510-515, please rewrite/clarify, it is confusing what are you doing. 

      We rewrote the sentences (Lines 529-532). We are sorry for the confusion.

      (3) Line 522, where are the expressions above Leq and K*? 

      Leq appears in the original paper of the Boecker model, but we decided not to use Leq. We apologize for not removing Leq from the present manuscript. The * in K* is the wildcard for representing the subscripts. We added a description for the role of “*”. 

      (4) Lines 525-530, based on the wording, it seems like you test first for 128 initial concentrations if the models converge back to the steady state and then you generate another set of 128 initial concentrations - is this what you are doing, or you simply use the 128 initial concentrations that have passed the test? 

      We apologize for the confusion. We did the first thing. We have rewritten the sentence to make it clearer. 

      (5) Figure 3, caption, by "broken line," did the authors mean "dashed line"? 

      We meant dashed line. We changed “broken line” to “dashed line”.

    1. eLife Assessment

      This study presents an important application of high-content image-based morphological profiling to quantitatively and systematically characterize induced pluripotent stem cell-derived mixed neural cultures cell type compositions. Exceptional evidence through rigorous experimental and computational validations support new potential applications of this cheap and simple assay.

    2. Joint Public Review:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The authors also propose a new nucleocentric phenotyping pipeline, where a convolutional neural network is trained on the nucleus and some margins around it. This nucleocentric approach improves classification performance at high densities because nuclear segmentation is less prone to errors in dense cultures.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of high-content phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell / nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review: 

      Summary: 

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cell-derived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contain sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and to describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single cell types in heterogeneous mixed cell populations hold great promise to characterize mixed cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including in depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raises the bar beyond the current state of the art in the field of highcontent phenotyping and makes this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of featurebased (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) application to multiple classification tasks.

      Comments on latest version:

      I have consulted with Reviewer #3 and both of us were impressed by revised manuscript, especially by the clear and convincing evidence regarding the nucleocentric model use of the nuclear periphery and its benefit for the case of dense cultures. However, there are two issues that are incompletely addressed (see below). Until these are resolved, the "strength of evidence" was elevated to "compelling".

      First, the analysis of the patch size is not clearly indicating that the 12-18um range is a critical factor (Fig. 4E). On the contrary, the performance seems to be not very sensitive to the patch size, which is actually a desired property for a method. Still, Fig. 4B convincingly shows that the nucleocentric model is not sensitive to the culture density, while the other models are. Thus, the authors can adjust their text saying that the nucleocentric approach is not sensitive to the patch size and that the patch size is selected to capture the nucleus and some margins around it, making it less prone to segmentation errors in dense cultures.

      We agree that there is a significant tolerance to different patch sizes, and have therefore reformulated the conclusion as suggested in the results and the discussion sections (page 10 and 16). As very large patch sizes (>40µm) do increase the variability of the predictions and the imbalance between recall and precision, we have left this observation in the results section, as it also motivates for using smaller patch sizes.  

      Second, the GitHub does not contain sufficient information to reproduce the analysis. Its current state is sparse with documentation that would make reproducing the work difficult. What versions of the software were used? Where should data be downloaded? The README contains references to many different argparse CLI arguments, but sparse details on what these arguments actually are, and which parameters the authors used to perform their analyses. Links to images are broken. Ideally, all of these details would be present, and the authors would include a step-by-step tutorial on how to reproduce their work. Fixing this will lead to an "exceptional" strength of evidence.

      We have added additional information to the GitHub to increase the reproducibility of the analysis.  

      • The README now contains additional documentation and more extensive explanations. A flowchart has been added, making the dataflow and order of analyses more clear.  

      • The accompanying dataset is 20GB in size and can be downloaded as a .zip-file from https://figshare.com/articles/dataset/Nucleocentric-Profiling/27141441?file=49522557. This file contains 2x480 raw images and a layout file.  

      • The used software versions are included in the manuscript in table 4. To increase the reproducibility, a Conda environment file (.yaml) has been added to the GitHub. This can be installed and contains the correct package versions.

      • The README now contains for each script and its arguments a short description on its meaning, on whether it is required or optional and its default setting.  

      • A step-by-step tutorial on the use of the test dataset has been included. This tutorial includes the arguments used to run the code from the command line terminal.

      Recommendations for the authors:

      There are no reference from the text to Fig. 2D and to Fig. 3C.

      This has been changed. The text has been added to the manuscript at page 6 (fig. 2D) and the reference to Fig. 3C has been included at page 8.

    1. eLife Assessment

      This important study reveals a new mechanism for gene regulation in neurons by an RNA binding protein called RBM20 previously studied in the heart. The methods used are compelling, including the generation of new mouse knockout strains and leading edge sequencing methods for identification of gene regulatory mechanisms. The study shows that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins in specific neuronal cell types, but the functional consequences of this regulation remain questions for the future.

    2. Reviewer #1 (Public review):

      Summary:

      The authors of this study set out to find RNA binding proteins in the CNS in cell-type specific sequencing data and discover that the cardiomyopathy-associated protein RBM20 is selectively expressed in olfactory bulb glutamatergic neurons and PV+ GABAergic neurons. They make an HA-tagged RBM20 allele to perform CLIP-seq to identify RBM20 binding sites and find direct targets of RBM20 in olfactory bulb glutmatergic neurons. In these neurons, RBM20 binds intronic regions. RBM20 has previously been implicated in splicing, but when they selectively knockout RBM20 in glutamatergic neurons they do not see changes in splicing, but they do see changes in RNA abundance, especially of long genes with many introns, which are enriched for synapse-associated functions. These data show that RBM20 has important functions in gene regulation in neurons, which was previously unknown, and they suggest it acts through a mechanism distinct from what has been studied before in cardiomyocytes.

      Strengths:

      The study finds expression of the cardiomyopathy-associated RNA binding protein RBM20 in specific neurons in the brain, opening new windows into its potential functions there.

      The study uses CLIP-seq to identify RBM20 binding RNAs in olfactory bulb neurons.

      Conditional knockout of RBM20 in glutamatergic or PV neurons allows the authors to detect mRNA expression that is regulated by RBM20.

      The data include substantial controls and quality control information to support the rigor of the findings.

      Weaknesses:

      The authors do not fully identify the mechanism by which RBM20 acts to regulate RNA expression in neurons, though they do provide data suggesting that neuronal RBM20 does not regulate alternate splicing in neurons, which is an interesting contrast to its proposed mechanism of function in cardiomyocytes. Discovery of the RNA regulatory functions of RBM20 in neurons is left as a question for future studies.

      The study does not identify functional consequences of the RNA changes in the conditional knockout cells, so this is also a question for the future.

    3. Reviewer #2 (Public review):

      Summary:

      The group around Prof. Scheiffele has made seminal discoveries reg. alternative splicing that is reflected by a current ERC advanced grant and landmark papers in eLife (2015), Science (2016), and Nature Neuroscience (2019). Recently, the group investigated proteins that contain an RRM motif in the mouse cortex. One of them, termed RBM20, was originally thought to be muscle-specific and involved in alternative splicing in cardiomyocytes. However, upon close inspection, RBP20 is expressed in a particular set of interneurons (PV positive cells of the somatosensory cortex) in the cortex as well as in mitral cells of the olfactory bulb (OB). Importantly, they used CLIP to identify targets in the OB and heart. Next and quite importantly, they generated a knock-in mouse line with a His-biotin acceptor peptide and a HA epitope to perform specific biochemistry. Not surprisingly, this allowed them to specifically identify transcripts with long introns, however, most of the intronic binding sites were very distant to the splice sites. Closer GO term inspection revealed that RBM20 specifically regulates synapse-related transcripts. In order to get in vivo insight into its function in the brain, the authors generated both global as well as conditional KO mice. Surprisingly, there were no significant differences in in RBM20 ΔPV interneurons, however, 409 transcripts were deregulated in in OB glutamatergic neurons. Here, CLIP sites were mostly found to be very distant from differentially expressed exons. Furthermore, loss-of-function RBM20 primarily yields loss of transcripts, whereas upregulation appears to be indirect. Together, these results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Strengths:

      The quality of the data and the figures is high, impressive and convincing. The reported results strongly suggest a role of RBM20 in the inclusion of cryptic exons thereby promoting target degradation.

      Weaknesses:

      I would not use the term weakness here.<br /> The description of the results is sometimes too dense and technical. As eLife does not have a size limit, there is no reason for the results section to be less than three pages. Especially the last paragraph of the results part (p4) does not do justice to the high importance of Fig. 5, which I consider of high importance and originality. Here are a few suggestions from a person that is not working on splicing, to improve the text part of this important manuscript.

      (1) Introduction: too short, include a paragraph on splicing and cryptic exons<br /> (2) Results:<br /> - shortly describe the phenotypes of the mice mentioned<br /> - expand the section on Fig. 5 and cryptic exons especially<br /> (3) Discussion: too short, expand on the possible new role of RBM20 and target degradation, possibly by adding a scheme?

    4. Reviewer #3 (Public review):

      Summary:

      The authors identified RBM20 expression in neural tissues using cell type-specific transcriptomic analysis. This discovery was further validated through in vitro and in vivo approaches, including RNA fluorescent in situ hybridization (FISH), open-source datasets, immunostaining, western blotting, and gene-edited RBM20 knockout (KO) mice. CLIP-seq and RiboTRAP data demonstrated that RBM20 regulates common targets in both neural and cardiac tissues, while also modulating tissue-specific targets. Furthermore, the study revealed that neuronal RBM20 governs long pre-mRNAs encoding synaptic proteins.

      Strengths:

      • Utilization of a large dataset combined with experimental evidence to identify and validate RBM20 expression in neural tissues.<br /> • Global and tissue-specific RBM20 KO mouse models provide robust support for RBM20 localization and expression.<br /> • Employing heart tissue as a control highlights the unique findings in neural tissues.

      Weaknesses:

      • Lack of physiological functional studies to explore RBM20's role in neural tissues.<br /> • Data quality requires improvement for stronger conclusions.<br /> • Western blot sample size should be increased for enhanced reliability.

    5. Author response:

      We thank the reviewers for the constructive suggestions made in the Public Reviews and the Recommendations to Authors. We intend to address these comments in a revised manuscript as follows:

      (1) We will revise the text according to the reviewer suggestions with regards to specific RBM20-dependent mRNAs and providing more detailed explanations in results and discussion.

      (2) We will upload higher resolution images of several figures (resolution had been reduced to achieve lower file sizes) to address the comment regarding “data quality”.

      (3) We will include data on eCLIP control experiments.

      (4) We will add information on replication and new data for the western blot analysis.

    1. eLife Assessment

      This valuable study shows a surprising scale-invariance of the covariance spectrum of large-scale recordings in the zebrafish brain in vivo. A solid analysis demonstrates that a Euclidean random matrix model of the covariance matrix recapitulates these properties. The results provide several new and insightful approaches for probing large-scale neural recordings.

    2. Joint Public Reviews:

      Summary:

      The authors examine the eigenvalue spectrum of the covariance matrix of neural recordings in the whole-brain larval zebrafish during hunting and spontaneous behavior. They find that the spectrum is approximately power law, and, more importantly, exhibits scale-invariance under random subsampling of neurons. This property is not exhibited by conventional models of covariance spectra, motivating the introduction of the Euclidean random matrix model. The authors show that this tractable model captures the scale invariance they observe. They also examine the effects of subsampling based on anatomical location or functional relationships. Finally, they briefly discuss the benefit of neural codes which can be subsampled without significant loss of information.

      Strengths:

      With large-scale neural recordings becoming increasingly common, neuroscientists are faced with the question: how should we analyze them? To address that question, this paper proposes the Euclidean random matrix model, which embeds neurons randomly in an abstract feature space. This model is analytically tractable and matches two nontrivial features of the covariance matrix: approximate power law scaling, and invariance under subsampling. It thus introduces an important conceptual and technical advance for understanding large-scale simultaneously recorded neural activity.

      Weaknesses:

      The downside of using summary statistics is that they can be hard to interpret. Often the finding of scale invariance, and approximate power law behavior, points to something interesting. But here caution is in order: for instance, most critical phenomena in neural activity have been explained by relatively simple models that have very little to do with computation (Aitchison et al., PLoS CB 12:e1005110, 2016; Morrell et al., eLife 12, RP89337, 2014). Whether the same holds for the properties found here remains an open question.

    3. Author response:

      We are grateful for the thorough and constructive feedback provided on our manuscript.

      Regarding the main concern about power law behavior and scale invariance, we would like to clarify that our study does not aim to establish criticality. Instead, we focus on describing and understanding a specific scale-invariant property: the collapsed eigenspectra in neural activity under random sampling. Indeed, we tested Morrell et al.’s latent-variable model (eLife 12, RP89337, 2024, [1]), where a slowly varying latent factor drives population activity. Although it produces a seemingly power-law-like spectrum, random sampling does not replicate the strict spectral collapse observed in our data (second row in Author response image 1). This highlights that simply adding latent factors does not fully recapitulate the scale invariance we measure, suggesting richer or more intricate processes may be involved in real neural recordings.

      Author response image 1.

      Morrell et al.’s latent variable model [1, 2]. A-D: Functional sampled (RSap) eigenspectral of the Morrell et al. model. E-H: Random sampled (RSap) eigenspectra of the same model. Briefly, in Morrell et al.’s latent variable model [1, 2], neural activity is driven by Nf latent fields and a place fields. The latent fields are modeled as Ornstein-Uhlenbeck processes with a time constant τ . The parameters ϵ and η control the mean and variance of individual neurons’ firing rates, respectively. The following are the parameter values used. A,E: Using the same parameters as in [1]: N<sub>f</sub> = 10, ϵ = −2.67, η = 6, τ = 0.1. Half of the cells are also coupled to the place field. B,C,D,F,G,H: Using parameters from [2]: N<sub>f</sub> = 5, ϵ = −3, η = 4. There is no place field. The time constant τ = 0.1, 1, 10 for B,F, C,G, and D,H, respectively.

      We decided to make 5 key revisions.

      • As mentioned, we have evaluated the latent variable model proposed by Morrell et al. and found that they fail to reproduce the scale-invariant eigenspectra observed in our data; these results will be presented in the Discussion section and supported by a new Supplementary Figure.

      • We will include a discussion on the findings of Manley et al. (2024, [2]) regarding the issue of saturating dimensionality in the Discussion section, highlighting the methodological differences and their implications.

      • We will add a new mathematical derivation in the Methods section, elucidating the bounded dimensionality using the spectral properties of our model.

      • We will elaborate in the Discussion section to further emphasize the robustness of our findings by demonstrating their consistency across diverse datasets and experimental techniques.

      • We will incorporate a brief discussion on the implications for neural coding. In particular, Fisher information can become unbounded when the slope of the power-law rank plot is less than one, as highlighted in the recent work by Moosavi et al. (bioRxiv 2024.08.23.608710, Aug, 2024 [3]) in the Discussion section.

      We believe these revisions will address the concerns raised by you and collectively strengthen our manuscript to provide a more comprehensive and robust understanding of the geometry and dimensionality of brain-wide activity.

      References

      (1) M. C. Morrell, A. J. Sederberg, I. Nemenman, Latent dynamical variables produce signatures of spatiotemporal criticality in large biological systems. Physical Review Letters 126, 118302 (2021).

      (2) M. C. Morrell, I. Nemenman, A. Sederberg, Neural criticality from effective latent variables. eLife 12, RP89337 (2024).

    1. eLife Assessment

      The authors use a range of techniques to examine the role of Aurora Kinase A (AurA) in trained immunity. The study is hypothesis driven, it uses solid experimental approaches, and the data are presented in a logical manner. The findings are valuable to the trained immunity field because they provide an in-depth look at a common inducer of trained immunity, beta-glucan.

    2. Reviewer #1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenocylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:

      With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      Weaknesses

      (1) The manuscript contains factual inaccuracies such as:<br /> (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649)<br /> (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.<br /> (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative.<br /> (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc).<br /> (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.<br /> (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.<br /> (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?<br /> (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      (7) Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      (8) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      (9) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      (10) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      (11) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      (12) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

    4. Author response:

      Reviewer #1 (Public review):

      This work regards the role of Aurora Kinase A (AurA) in trained immunity. The authors claim that AurA is essential to the induction of trained immunity. The paper starts with a series of experiments showing the effects of suppressing AurA on beta-glucan-trained immunity. This is followed by an account of how AurA inhibition changes the epigenetic and metabolic reprogramming that are characteristic of trained immunity. The authors then zoom in on specific metabolic and epigenetic processes (regulation of S-adenosylmethionine metabolism & histone methylation). Finally, an inhibitor of AurA is used to reduce beta-glucan's anti-tumour effects in a subcutaneous MC-38 model.

      Strengths:

      With the exception of my confusion around the methods used for relative gene expression measurements, the experimental methods are generally well-described. I appreciate the authors' broad approach to studying different key aspects of trained immunity (from comprehensive transcriptome/chromatin accessibility measurements to detailed mechanistic experiments). Approaching the hypothesis from many different angles inspires confidence in the results (although not completely - see weaknesses section). Furthermore, the large drug-screening panel is a valuable tool as these drugs are readily available for translational drug-repurposing research.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      (1) The manuscript contains factual inaccuracies such as: (a) Intro: the claim that trained cells display a shift from OXPHOS to glycolysis based on the paper by Cheng et al. in 2014; this was later shown to be dependent on the dose of stimulation and actually both glycolysis and OXPHOS are generally upregulated in trained cells (pmid 32320649).

      We appreciate the reviewer for pointing out this inaccuracy, and we will revise our statement to ensure accurate and updated description. We are aware that trained immunity involves different metabolic pathways, including both glycolysis and oxidative phosphorylation[1, 2]. We also detected Oxygen Consumption Rate (OCR, as detailed in comment#8) but observed no increase of oxygen consumption in trained BMDMs while previous study reported decreased oxidative phosphorylation[3]. We will discuss the potential reasons underlying such different results.

      (b) Discussion: Trained immunity was first described as such in 2011, not decades ago.

      We are sorry for the inaccurate description, and we will correct the statement in our revised manuscript as “Despite the fact that the concept of “trained immunity” has been proposed since 2011, the mechanisms that regulate trained immunity are still not completely understood.”

      (2) The authors approach their hypothesis from different angles, which inspires a degree of confidence in the results. However, the statistical methods and reporting are underwhelming.

      (a) Graphs depict mean +/- SEM, whereas mean +/- SD is almost always more informative. (b) The use of 1-tailed tests is dubious in this scenario. Furthermore, in many experiments/figures the case could be made that the comparisons should be considered paired (the responses of cells from the same animal are inherently not independent due to their shared genetic background and, up until cell isolation, the same host factors like serum composition/microbiome/systemic inflammation etc). (c) It could be explained a little more clearly how multiple testing correction was done and why specific tests were chosen in each instance.

      Thank you for the suggestions and we will revise all data presented as mean ± SEM in the manuscript to mean ± SD, and provide a detailed description of how multiple comparisons were performed and explain the rationale behind the different comparison methods used. Previous studies have shown that knockdown of GNMT increases intracellular SAM level and knockdown of GNMT is commonly used as a method to upregulate SAM[4-6]. Thus we used 1-tailed test in Figure 3J.

      (d) Most experiments are done with n = 3, some experiments are done with n = 5. This is not a lot. While I don't think power analyses should be required for simple in vitro experiments, I would be wary of drawing conclusions based on n = 3. It is also not indicated if the data points were acquired in independent experiments. ATAC-seq/RNA-seq was, judging by the figures, done on only 2 mice per group. No power calculations were done for the in vivo tumor model.

      We are sorry for the confusion in our description in figure legends. As for in vitro studies, we performed at least three independent experiments (BMs isolated from different mice) but we only display technical replicates data from one experiment in our manuscript. As for seq data, we acknowledge the reviewer's concern regarding the small sample size (n=2) in our RNA-seq/ATAC-seq experiment. We consider the sequencing experiment mainly as an exploratory approach, and performed rigorous quality control and normalization of the sequencing data to ensure the reliability of our findings. While we understand that a larger sample size would be ideal for drawing more definitive conclusions, we believe that the current data offer valuable preliminary insights that can inform future studies with larger cohorts. As a complementary method, we conducted ChIP PCR for detecting active histone modification enrichment in Il6 and Tnf region to further verify the increased accessibility of trained immunity induced inflammatory genes and reliability of our conclusions despite the small sample size. We hope this clarifies our approach, and we would be happy to further acknowledge and discuss the limitations of the current study.

      For the in vivo experiment, we determined the sample size by referring to the animal numbers used for similar experiments in literatures. And according to a reported resource equation approach for calculating sample size in animal studies[7], n=5-7 is suitable for most of our mouse experiments. We will describe the details in the revised methods part.

      (e) Furthermore, the data spread in many experiments (particularly BMDM experiments) is extremely small. I wonder if these are true biological replicates, meaning each point represents BMDMs from a different animal? (disclaimer: I work with human materials where the spread is of course always much larger than in animal experiments, so I might be misjudging this.).

      We are sorry for the confusion in our description in figure legends. In vivo experiments represent individual mice as biological replicates, the exact values of n are reported in figure legends and each point represents data from a different animal (Figure 1I, and Figure 6). The in vitro cell assay was performed in triplicates, each experiment was independently replicated at least three times and points represents technical replicates.

      (3) Maybe the authors are reserving this for a separate paper, but it would be fantastic if the authors would report the outcomes of the entire drug screening instead of only a selected few. The field would benefit from this as it would save needless repeat experiments. The list of drugs contains several known inhibitors of training (e.g. mTOR inhibitors) so there must have been more 'hits' than the reported 8 Aurora inhibitors.

      Thank you for your suggestion and we will report the outcomes of the entire drug screening in the revised manuscript.

      (4) Relating to the drug screen and subsequent experiments: it is unclear to me in supplementary figure 1B which concentrations belong to secondary screens #1/#2 - the methods mention 5 µM for the primary screen and "0.2 and 1 µM" for secondary screens, is it in this order or in order of descending concentration?

      Thank you for your comments and we are sorry for unclear labelled results in supplementary 1B. We performed secondary drug screen at two concentrations, and drug concentrations corresponding to secondary screen#1 and #2 are 0.2, 1 μM respectively. That is to say, it is just in this order, not in an order of descending concentration.

      (a) It is unclear if the drug screen was performed with technical replicates or not - the supplementary figure 1B suggests no replicates and quite a large spread (in some cases lower concentration works better?)

      Thank you for your question. The drug screen was performed without technical replicates. Actually, we observed s a lower concentration works better in some cases. This might be due to the fact that the drug's effect correlates positively with its concentration only within a specific range (as seen in comment#4).

      (5) The methods for (presumably) qPCR for measuring gene expression in Figure 1C are missing. Which reference gene was used and is this a suitably stable gene?

      We are sorry for the omission for the qPCR method. The mRNA expression of Il6 and Tnf in trained BMDMs was normalized to untrained BMDMs and β-actin served as a reference gene. And we will describe in detail in our revised manuscript.

      (6) From the complete unedited blot image of Figure 1D it appears that the p-Aurora and total Aurora are not from the same gel (discordant number of lanes and positioning). This could be alright if there are no/only slight technical errors, but I find it misleading as it is presented as if the actin (loading control to account for aforementioned technical errors!) counts for the entire figure.

      Thanks for this comment. In the original data, p-Aurora and total Aurora were from different gels. In this experiment the membrane stripping/reprobing after p-Aurora antibody did now work well, so we couldn’t get all results from one gel, and we had to run another gel using the same samples to blot with anti-aurora antibody. Yes we should have provided separated actin blots as loading controls for this experiment. We will repeat the experiment and provide original data of three biological replicates to confirm the experiment result.

      Figure 2: This figure highlights results that are by far not the strongest ones - I think the 'top hits' deserve some more glory. A small explanation on why the highlighted results were selected would have been fitting.

      We appreciate the valuable suggestion. We will make a discussion in our revised manuscript.

      (7) Figure 3 incl supplement: the carbon tracing experiments show more glucose-carbon going into TCA cycle (suggesting upregulated oxidative metabolism), but no mito stress test was performed on the seahorse.

      We appreciate this question raised by the reviewer. We previously performed seahorse XF analyze to measure mito stress in β-glucan trained BMDMs in combination with alisertib (data not shown in our submitted manuscript). The results showed no increase in oxidative phosphorylation under β-glucan stimulation.

      Author response image 1.

      (8) Inconsistent use of an 'alisertib-alone' control in addition to 'medium', 'b-glucan', 'b-glucan + alisertib'. This control would be of great added value in many cases, in my opinion.

      Thank you for your comment. We appreciate that including “alisertib-alone” group throughout all the experiments may add more value to the findings. We set the aim of the current study to investigate the role of Aurora kinase A in trained immunity. Therefore, in most settings, we did not focus on the role of aurora kinase A without β-glucan stimulation. Initially, we showed in Figure 1B and 1C that alisertib alone in a concentration lower than 1μM (included) does not affect the response to secondary stimulus. In a previous report, the authors showed that Aurora A inhibitor alone did not affect trained immunity[8]. Thus, we did not include this control group in all of the experiments.

      (9) Figure 4A: looking at the unedited blot images, the blot for H3K36me3 appears in its original orientation, whereas other images appear horizontally mirrored. Please note, I don't think there is any malicious intent but this is quite sloppy and the authors should explain why/how this happened (are they different gels and the loading sequence was reversed?)

      Thank you for pointing out this error. After checking the original data, we found that we indeed misassembled the orientation of several blots. We went through the assembling process and figured out that some orientations were assembled according to the loading sequences but not saved, so that the orientations in Figure 4A were not consistent with the unedited blot image. We are sorry for the careless mistake, and we will double check to make sure all the blots are correctly assembled in the revised manuscript.

      (10) For many figures, for example prominently figure 5, the text describes 'beta-glucan training' whereas the figures actually depict acute stimulation with beta-glucan. While this is partially a semantic issue (technically, the stimulation is 'the training-phase' of the experiment), this could confuse the reader.

      Thanks for the reviewer’s suggestion and we will reorganize our language to ensure clarity and avoid any inconsistencies that might lead to misunderstanding.

      (11) Figure 6: Cytokines, especially IL-6 and IL-1β, can be excreted by tumour cells and have pro-tumoral functions. This is not likely in the context of the other results in this case, but since there is flow cytometry data from the tumour material it would have been nice to see also intracellular cytokine staining to pinpoint the source of these cytokines.

      Thanks for the reviewer’s suggestion. To address potential concerns raised by the reviewers, we will perform intracellular cytokines staining in tumor experiments with mice trained with β-glucan or in combination with alisertib followed MC38 inoculation.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the inhibition of Aurora A and its impact on β-glucan-induced trained immunity via the FOXO3/GNMT pathway. The study demonstrates that inhibition of Aurora A leads to overconsumption of SAM, which subsequently impairs the epigenetic reprogramming of H3K4me3 and H3K36me3, effectively abolishing the training effect.

      Strengths:

      The authors identify the role of Aurora A through small molecule screening and validation using a variety of molecular and biochemical approaches. Overall, the findings are interesting and shed light on the previously underexplored role of Aurora A in the induction of β-glucan-driven epigenetic change.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses:

      Given the established role of histone methylations, such as H3K4me3, in trained immunity, it is not surprising that depletion of the methyl donor SAM impairs the training response. Nonetheless, this study provides solid evidence supporting the role of Aurora A in β-glucan-induced trained immunity in murine macrophages. The part of in vivo trained immunity antitumor effect is insufficient to support the final claim as using Alisertib could inhibits Aurora A other cell types other than myeloid cells.

      We appreciate the question raised by the reviewer. Though SAM generally acts as methyl donor, whether the epigenetic reprogram in trained immunity is directly linked to SAM metabolism is not known. In our study, we provided evidence suggesting the necessity of SAM maintenance in supporting trained immunity. As for in vivo tumor model, tumor cells were subcutaneously inoculated 24 h after oral administration of alisertib. Previous studies showed alisertib administered orally had a half-life of 10 h and 90% concentration reduction in serum after 24 h [9, 10]. Therefore, we suppose that tumor cells are more susceptible to long-term effects of drugs on the immune system rather than directly affected by alisertib. To further address the reviewer’s concern, we will perform bone marrow transplantation (trained mice as donor and naïve mice as recipient) to clarify the mechanistic contribution of trained immunity versus off-target effects.

      Cited references

      (1) Ferreira, A.V., et al., Metabolic Regulation in the Induction of Trained Immunity. Semin Immunopathol, 2024. 46(3-4): p. 7.

      (2) Keating, S.T., et al., Rewiring of glucose metabolism defines trained immunity induced by oxidized low-density lipoprotein. J Mol Med (Berl), 2020. 98(6): p. 819-831.

      (3) Li, X., et al., Maladaptive innate immune training of myelopoiesis links inflammatory comorbidities. Cell, 2022. 185(10): p. 1709-1727.e18.

      (4) Luka, Z., S.H. Mudd, and C. Wagner, Glycine N-methyltransferase and regulation of S-adenosylmethionine levels. J Biol Chem, 2009. 284(34): p. 22507-11.

      (5) Hughey, C.C., et al., Glycine N-methyltransferase deletion in mice diverts carbon flux from gluconeogenesis to pathways that utilize excess methionine cycle intermediates. J Biol Chem, 2018. 293(30): p. 11944-11954.

      (6) Simile, M.M., et al., Nuclear localization dictates hepatocarcinogenesis suppression by glycine N-methyltransferase. Transl Oncol, 2022. 15(1): p. 101239.

      (7) Arifin, W.N. and W.M. Zahiruddin, Sample Size Calculation in Animal Studies Using Resource Equation Approach. Malays J Med Sci, 2017. 24(5): p. 101-105.

      (8) Benjaskulluecha, S., et al., Screening of compounds to identify novel epigenetic regulatory factors that affect innate immune memory in macrophages. Sci Rep, 2022. 12(1): p. 1912.

      (9) Yang, J.J., et al., Preclinical drug metabolism and pharmacokinetics, and prediction of human pharmacokinetics and efficacious dose of the investigational Aurora A kinase inhibitor alisertib (MLN8237). Drug Metab Lett, 2014. 7(2): p. 96-104.

      (10) Palani, S., et al., Preclinical pharmacokinetic/pharmacodynamic/efficacy relationships for alisertib, an investigational small-molecule inhibitor of Aurora A kinase. Cancer Chemother Pharmacol, 2013. 72(6): p. 1255-64.

    1. eLife assessment

      This paper makes a valuable contribution to the area of balancing selection at the Major histocompatibility complex (MHC), including trans-species polymorphism between humans and other primates, by incorporating a large evolutionary range of species and genes and by using newer methodological approaches to characterize the depth and extent of the trans-species polymorphism across an expanded range of primate taxa. While the presented results solidly support the authors' conclusions, additional analyses would be needed to firmly exclude modes of evolution that could mimic trans-specific polymorphism.

    2. Reviewer #1 (Public Review):

      HLA genes have long been known to harbor trans-species polymorphism (TSP). This manuscript aimed to use state-of-the-art analyses and updated genotyping data to rigorously test for the presence of TSP in HLA genes, quantify the timescales associated with HLA TSP, and relate HLA disease associations to evolutionary rates. To do this, the authors chose HLA alleles across great apes, old world monkeys, and new world monkeys on which to perform phylogenetic analyses, alongside non-parametric tests that compare patterns of synonymous diversity. Finally, HLA genetic associations with the disease were correlated with evolutionary rate.

      Strengths:

      The manuscript is well written and neatly organized, the figures are clear, and there are many supplementary analyses that will make this paper a great resource for MHC phylogenetics at allelic resolution.

      Deployment of modern methodology such as BEAST2 can also test if the hypothesis of TSP is supported while accounting for uncertainties in tree topology and evolutionary rates, necessary additions to analyses of the MHC.

      Weaknesses:

      Because TSP has already been convincingly demonstrated to occur in the MHC, the primary benefit of the current study is to ensure these previous observations are still supported by the wealth of genetic data that is now available and modern phylogenetic approaches. However, the benefit of using the robust BEAST2 method comes with the weakness of not using all available data. Focusing on single gene trees with only a small subset of alleles may bias results, and inclusion/exclusion criteria should be better defined.

      One major point that is somewhat overlooked is the presence of multiple copy numbers for the MHC genes through classic birth and death evolution. For example, MHC-B in new world monkeys is duplicated many times (up to 10; PMID: 23715823). This duplication is naturally accompanied by gene loss and pseudogene formation. All of these things muddy the waters considerably yet are not addressed here. A good example is MHC-A, where it has been very difficult to apportion orthologs, even amongst closely related species, due to alternative or incomplete duplication/loss across the species, or region configuration polymorphism (e.g. PMID: 26371256). An example is chimpanzee Patr-AL which shares similarities with human HLA-A*02 lineage, but is a separate locus, could this show up as TSP under the current analysis?

      Similarly, an alternative hypothesis for TSP is convergent gene conversion mutations: intergenic gene conversion has been repeatedly observed in HLA genes and the possibility of it occurring with the same two genes becomes more realistic over 45 million years. If the same two MHC genes recombined in humans and in an NWM, each on their own lineages, this would appear as TSP and would cause an overlap of pairwise synonymous divergence between human-human and human-NWM allele comparisons. This might be especially possible in MHC-DR and MHC-DQ genes presented in Figure 2 since both humans and NWM have multiple MHC-DRB and DQB genes (unless e.g. were genes besides HLA/MHC-DRB1 such as DRB3,4,5 included in the DRB phylogenies?). While BEAST2 may be a good way of robustly modeling and identifying TSP, and I understand these analyses cannot support many more sequences, the authors should consider adding an analysis that rules out gene conversion as an explanation for their results (especially the often repeated claim of 45 million year TSP). For example, can the authors use BLAST to ensure that the alleles that underlie 45 million years of TSP do not share close similarities to other HLA genes present in their respective human and NWM genomes? This seems like it could be fairly quickly performed for all genes, and even if it argued against TSP, it would be an interesting finding.

      Finally, the authors have limited themselves to a small subset of HLA/MHC alleles and do not provide sufficient information in the methods to understand how these were chosen nor sufficient discussion surrounding how inclusion/exclusion criteria could bias results. For example, the authors say the alleles were chosen at 2-digit (i.e. 1 field) resolution, but in the phylogenies of Fig. 2, I see variable numbers of alleles chosen for each 2-digit allele family - what metric was used to decide on these alleles?

      "We also collected associations between amino acids and TCR phenotypes". It is not clear either what was analyzed, or the results for this part of the analysis. This is a topic of much debate and none of the previous work has been discussed (PMID: 18304006, PMID: 29636542 as primers for this contentious subject)

      MHC class I also interact with NK cell receptors, including polymorphic KIR. Through their interactions during infection control and reproduction, the two complexes co-evolve across primates, contributing to the maintenance of MHC diversity. Interaction with KIR likely has a greater impact on HLA polymorphism than interactions with TCR, yet this is not factored into any of the models, or indeed mentioned in the text.

      One additional reason inclusion of the KIR binding is important relates to the point above about gene conversion, where it is established that gene conversion reproducibly swaps KIR-binding motifs among MHC class I alleles and genes. HLA-A*23, *24, and *25, *32, for example, are characterized by the acquisition of the 'Bw4' motif from HLA-B (PMID: 26284483), likely followed by positive natural selection. For exon 2 (which encodes the motif), these alleles turn up in a clade distinct from other human HLA-A (Fig 2-S1). What is the impact of the Bw4 motif on this phylogeny? Could this shuffling of motifs be interpreted as indirect TSP?

      The analysis that shows the most rapidly evolving sites occur in the peptide binding domain brings little new to the field. This has been established by the Hughes and Nei (cited) and Parham, Lawlor, etc of 1988 (e.g. PMID: 3375250), and replicated multiple times across human populations and many other species.<br /> Likewise, the disease association part. It is nice to have a summary of the known associations, but there are others out there and this one is far from thorough. Here, 50% of the information about infectious diseases appears to be taken from one reference, leaving out some major bodies of work; for example identifying specific peptide binding residues or peptides that associate with HIV (PMID: 22896606) or malaria control (PMID: 1280333). It is also missing some major concepts -such as the DRB1 'shared epitope' of peptide binding residues that predispose to Rheumatoid Arthritis and protects from Parkinson's disease (35 years of work from PMID: 2446635 through PMID: 30910980). The nasopharyngeal carcinoma and EBV story (e.g. PMID: 23209447). Another huge gap here is the pregnancy syndromes -associations of specific HLA C and NK cell receptor allotypes with preeclampsia for example. There are thousands of HLA associations not considered in this section, and to do them justice would likely require an enormous amount of work.<br /> Thus - neither the idea that HLA/MHC polymorphism is focused on peptide binding nor that this binding drives resistance to infection and associations with the disease are new concepts. The previous work in these areas is inadequately acknowledged.

      The paper is written in a very approachable language, which is nice to read and friendly to non-experts, but perhaps a little too much so in places. I find that the paper follows a very non-traditional format with respect to for example the results section, which seems a mixture of Introduction/methods/figure legends/discussion with no real solid result description.

    3. Reviewer #2 (Public Review):

      Fortier and Pritchard investigated the breadth and depth of trans-species polymorphism (TSP) within six primate classical (antigen-presenting) major histocompatibility complex (MHC) genes (three MHC class I and three MHC class II). The MHC is of wide interest because of its unique evolutionary patterns within the genomes of jawed vertebrates and for its extensive and consistent associations with disease phenotypes. The findings of the paper are:<br /> 1) Trans-species polymorphism (TSP) within major histocompatibility complex (MHC) genes, whereby some alleles are more similar between rather than within species, occurs between humans and non-human primates despite rapid allelic turnover.<br /> 2) Highly polymorphic/rapidly evolving sites are mostly involved in peptide binding.<br /> 3) The identified, rapidly-evolving sites are associated with disease.

      However, because these general findings have been previously demonstrated to varying extents by numerous other studies, these are not the strength of this paper. The strength and importance of this paper are in its utilization of a large evolutionary range of species and genes and its methodological approach and the extent of analyses undertaken to characterize the depth and extent of the TSP among primates. The major contribution of this paper is showing that TSP in the MHC is widespread among diverse primate taxa, and, depending on the particular MHC gene, TSP can be detected between humans and non-human primates as distantly diverged from the human lineage as new world monkeys of the Americas, ~45 million years ago. The paper, overall, made good methodological choices to account for the fascinating but challenging nature of the MHC, which includes its extensive allelic polymorphism (much of which is only characterized for the peptide-binding domain, encoded by exons 2 and 3), the difficulty in assessing phylogenetic relationships (particularly due to recombination and/or interallelic gene conversion), and differentiating convergence from conservation. There is no single analysis that can perfectly account for all these factors. This paper used two methods to test for TSP, Bayesian evolutionary analysis and synonymous nucleotide distances (dS), each with their respective strengths and limitations articulated. TSP, to varying degrees, is supported by both analyses. The paper further identifies rapidly evolving positions within the MHC molecules (predominantly located in the MHC peptide-binding domain), quantitatively shows that they are more likely to be in proximity to the bound peptide within the peptide binding domain, and shows, via a literature review of HLA fine-mapping studies, that those positions are associated with both infectious and autoimmune disease.

      The conclusions of the paper, therefore, are supported and appropriate with the most important caveats noted, but the paper would benefit from:<br /> 1) Addressing how copy number variation of MHC class I genes among primate species might have affected their analyses and results (only single representative genes of the class II MHC, which also exhibit copy number variation, were used for this study).<br /> 2) Considering the differences between class I and class II MHC roles in immune function and how those might relate to the observed patterns.

    1. eLife Assessment

      The useful manuscript presents interesting findings in the field of neurodegenerative diseases by highlighting the dual role of phosphorylated ubiquitin (pUb) in cellular proteostasis and neurotoxicity. However, some claims for discovery are supported by unconvincing and incomplete evidence that requires further validation. The poor quality of key immunofluorescent images and questionable quantification analysis raise technical concerns.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.<br /> (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.<br /> (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.<br /> (3) In Figure 6, relying solely on Western blot staining and golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.<br /> (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.<br /> (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischaemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organisational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable.<br /> Methods are also very poor and difficult to follow.

      Until technical issues are addressed I think this would represent an unreliable contribution to the field.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.<br /> - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.<br /> - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.<br /> - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.<br /> - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.<br /> - Figure 4B should also indicate which brain region is being presented.

    5. Author response:

      Public Reviews:<br /> Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further considered.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Weaknesses:

      (1) PINK1 has been reported as a kinase capable of phosphorylating Ubiquitin, hence the expected outcome of increased p-Ub levels upon PINK1 overexpression. Figures 5E-F do not demonstrate a significant increase in Ub levels upon overexpression of PINK1 alone, whereas the evident increase in Ub expression upon overexpression of S65A is apparent. Therefore, the notion that increased Ub phosphorylation leads to protein aggregation in mouse hippocampal neurons is not yet convincingly supported.

      Indeed, overexpression of sPINK1* alone caused little change in Ub levels in the soluble fraction (Figure 5E), which is expected. Ub in the soluble fraction is in a relatively stable, buffered state. However, overexpression of sPINK1* resulted in an increase in Ub levels in the insoluble fraction, indicating protein aggregation. The molecular weight of Ub in the insoluble fraction was predominantly below 70 kDa, implying that phosphorylation inhibits Ub chain elongation.

      To further examine this, we used the Ub/S65A mutant to antagonize Ub phosphorylation, and found that the aggregation at low molecular weight was significantly reduced, indicating a partial restoration of proteasomal activity. The increase in Ub levels in both the soluble and insoluble fractions likely results from the high rate of ubiquitination driven by the elevated levels of Ub. Notably, the overexpressed Ub/S65A was detected in the Western blot using the wild-type Ub antibody, which accounts for the apparently increased Ub level.

      When overexpressing Ub/S65E, we again saw an increase in Ub levels in the insoluble fraction (but no increase in the soluble fraction), with low molecular weight bands even more prominent than those observed with sPINK1* transfection. These findings collectively support the conclusion that sPINK1* promotes protein aggregation through Ub phosphorylation.

      (2) The specificity of PINK1 and p-Ub antibodies requires further validation, as a series of literature indicate that the expression of the PINK1 protein is relatively low and difficult to detect under physiological conditions.

      We acknowledge the challenges in achieving optimal specificity for commercially available and custom-generated antibodies targeting PINK1 and pUb, particularly given the low endogenous levels of these proteins under physiological conditions. Despite these limitations, we observed robust immunofluorescent staining for PINK1 (Figures 1A, 1C, and 1G) and pUb (Figures 1B, 1D, and 1G) in human brain samples from Alzheimer's disease (AD) patients, as well as in mouse brains from models of AD and cerebral ischemia. The significant elevation of PINK1 and pUb under these pathological conditions likely accounts for the clear visualization. To validate antibody specificity, we have included images from pink1-/- mice as negative controls in the revised manuscript (Figure 1C and 1D, third panel).

      In addition, we detected a significant increase in pUb levels in aged mouse brains compared to young ones (Figures 1E and 1F). Notably, in pink1-/- mice, pUb levels remained unchanged between young and aged groups, despite some background signal, further supporting the conclusion that pUb accumulation during aging is PINK1-dependent.

      In HEK293 cells, pink1-/- cells served as a negative control for PINK1 (Figure 2B and 2C) and for pUb (Figure 2D and 2E). While the Western blot using the pUb antibody displayed some nonspecific background, pUb levels in pink1-/- cells remained unchanged across all MG132 treatment conditions (Figures 2D and 2E), further attesting the reliability of our findings.

      (3) In Figure 6, relying solely on Western blot staining and Golgi staining under high magnification is insufficient to prove the impact of PINK1 overexpression on neuronal integrity and cognitive function. The authors should supplement their findings with immunostaining results for MAP2 or NeuN to demonstrate whether neuronal cells are affected.

      Thank you for raising this important point. We included NeuN immunofluorescent staining in Figure 5—figure supplement 2 of the original manuscript. The results demonstrate a significant loss of NeuN-positive cells in the hippocampus following Ub/S65E overexpression, while no apparent change in NeuN-positive cells was observed with sPINK1* transfection alone. These findings provide evidence of neuronal loss in response to Ub/S65E, further supporting the impact of pUb elevation on neuronal integrity.

      While we did not perform MAP2 immunostaining, we included complementary analyses to assess neuronal integrity. Specifically, we performed Western blotting to determine MAP2 protein levels and used Golgi staining to study neuronal morphology and synaptic structure in greater detail. These analyses revealed that overexpression of sPINK1* or Ub/S65E decreased MAP2 levels and caused damage to synaptic structures (Figures 6F and 6H). Importantly, the deleterious effects of sPINK1* overexpression could be rescued by co-expression of Ub/S65A, further underscoring the role of pUb in mediating these changes.

      Together, our NeuN immunostaining, MAP2 analysis, and Golgi staining provide strong evidence for the impact of PINK1 overexpression and pUb elevation on neuronal integrity and synaptic health. We believe these complementary approaches sufficiently address the reviewer’s concern and highlight the pathological consequences of elevated pUb levels.

      (4) The authors should provide more detailed figure captions to facilitate the understanding of the results depicted in the figures.

      Figure captions will be updated with more details in the revised manuscript.

      (5) While the study proposes that pUb promotes neurodegeneration by affecting proteasomal function, the specific molecular mechanisms and signaling pathways remain to be elucidated.

      The specific molecular mechanisms and signaling pathways through which pUb promotes neurodegeneration are likely multifaceted and interconnected. Mitochondrial dysfunction appears to be a central contributor to neurodegeneration following sPINK1* overexpression. This is supported by (1) an observed increase in full-length PINK1, indicative of impaired mitochondrial quality control, and (2) proteomic data revealing enhanced mitophagy at 30 days post-transfection and substantial mitochondrial injury by 70 days post-transfection. The progressive damage to mitochondria caused by protein aggregates can cause further neuronal injury and degeneration.

      In addition, reduced proteasomal activity may result in the accumulation of inhibitory proteins that are normally degraded by the ubiquitin-proteasome system. Our proteomics analysis identified a >54-fold increase in CamK2n1 (UniProt ID: Q6QWF9), an endogenous inhibitor of CaMKII activation, following sPINK1* overexpression. This is particularly significant because the accumulation of CamK2n1 could suppress CaMKII activation and, subsequently, inhibit the CREB signaling pathway (illustrated below). As CREB is essential for synaptic plasticity and neuronal survival, its inhibition may further amplify neurodegenerative processes.

      While our study identifies proteasomal dysfunction and mitochondrial damage as key initial triggers, downstream effects—such as disruptions in signaling pathways like CaMKII-CREB—likely contribute to a broader cascade of pathological events. These findings highlight the complexity of pUb-mediated neurodegeneration and suggest that further exploration of downstream mechanisms is necessary to fully elucidate the pathways involved.

      We plan to include the proteomics data, in the revised manuscript, of mouse brain tissues at 30 days and 70 days post-transfection, to further highlight this downstream effect upon proteasomal dysfunction.

      Author response image 1.

      Reviewer #2 (Public review):

      Summary:

      The manuscript makes the claim that pUb is elevated in a number of degenerative conditions including Alzheimer's Disease and cerebral ischemia. Some of this is based on antibody staining which is poorly controlled and difficult to accept at this point. They confirm previous results that a cytosolic form of PINK1 accumulates following proteasome inhibition and that this can be active. Accumulation of pUb is proposed to interfere with proteostasis through inhibition of the proteasome. Much of the data relies on over-expression and there is little support for this reflecting physiological mechanisms.

      Weaknesses:

      The manuscript is poorly written. I appreciate this may be difficult in a non-native tongue, but felt that many of the problems are organisational. Less data of higher quality, better controls and incision would be preferable. Overall the referencing of past work is lamentable.

      Methods are also very poor and difficult to follow.<br /> Until technical issues are addressed I think this would represent an unreliable contribution to the field.

      (1) Antibody specificity and detection under pathological conditions

      We acknowledge the limitations of commercially available antibodies for detecting PINK1 and pUb. Despite these challenges, our findings demonstrate a significant increase in PINK1 and pUb levels under pathological conditions, such as Alzheimer's disease (AD) and ischemia. Additionally, we observed an increase in pUb level during brain aging, further highlighting its relevance in this particular physiological process. To ensure reliable quantification of PINK1 and pUb levels, we used pink1-/- mice and HEK293 cells as negative controls. For example, PINK1 levels were extremely low in control cells but increased dramatically after 2 hours of oxygen-glucose deprivation (OGD) and 6 hours of reperfusion (Figure 1H). Together, these controls validate that the observed elevations in PINK1 and pUb are specific and linked to pathological or certain physiological conditions.

      (2)  Overexpression as a model for pathological conditions

      To investigate whether the inhibitory effects of sPINK1* on the ubiquitin-proteasome system (UPS) are dependent on its kinase activity, we utilized a kinase-dead version of sPINK1* as a negative control. Since PINK1 has multiple substrates, we further explored whether its effects on UPS inhibition were mediated specifically by ubiquitin phosphorylation. For this, we used Ub/S65A (a phospho-null mutant) to antagonize Ub phosphorylation by sPINK1*, and Ub/S65E (a phospho-mimetic mutant) to mimic phosphorylated Ub. These well-defined controls ensured the robustness of our conclusions.

      While overexpression does not perfectly replicate physiological conditions, it serves as a valuable model for studying pathological scenarios such as neurodegeneration and brain aging, where pUb levels are known to increase. For example, we observed a 30.4% increase in pUb levels in aged mouse brains compared to young brains (Figure 1F). Similarly, in our sPINK1* overexpression model, pUb levels increased by 43.8% and 59.9% at 30- and 70-days post-transfection, respectively, compared to controls (Figures 5A and 5C). Notably, co-expression of sPINK1* with Ub/S65A almost entirely prevented sPINK1* accumulation (Figure 5B), indicating that an active UPS can efficiently degrade sPINK1*. Collectively, these findings show that sPINK1* accumulation inhibits UPS activity, a defect that can be rescued by the phospho-null Ub mutant. Thus, this overexpression model closely mimics pathological conditions and offers valuable insights into pUb-mediated proteasomal dysfunction.

      (3) Organization of the manuscript

      We believe the structure of the manuscript is justified and systematically addresses the key aspects of the study in a logic flow:

      (a) Evidence for the increase of PINK1 and pUb in multiple pathological and physiological conditions.

      (b) Identification of the sources and consequences of sPINK1 and pUb elevation.

      (c) Mechanistic insights into how pUb inhibits UPS-mediated degradation.

      (d) Validation of these findings using pink1-/- mice and cells.

      (e) Evidence of the reciprocal relationship between proteasomal inhibition and pUb elevation, culminating in neurodegeneration.

      (f) Demonstration of elevated pUb levels and protein aggregation in the hippocampus following sPINK1* overexpression, supported by proteomic analyses, behavioral tests, Western blotting, and Golgi staining.

      Thus, this organization provides a clear and cohesive narrative, culminating in the demonstration that sPINK1* overexpression induces hippocampal neuron degeneration.

      (4) Revisions to writing, referencing, and methodology

      We will improve the clarity and flow of the manuscript, add more references to properly acknowledge prior work, and incorporate additional details into the Methods section to enhance readability and reproducibility. These improvements should address the organizational and technical concerns raised, while strengthen the overall quality of the manuscript.

      Reviewer #3 (Public review):

      Summary:

      This study aims to explore the role of phosphorylated ubiquitin (pUb) in proteostasis and its impact on neurodegeneration. By employing a combination of molecular, cellular, and in vivo approaches, the authors demonstrate that elevated pUb levels contribute to both protective and neurotoxic effects, depending on the context. The research integrates proteasomal inhibition, mitochondrial dysfunction, and protein aggregation, providing new insights into the pathology of neurodegenerative diseases.

      Strengths:

      - The integration of proteomics, molecular biology, and animal models provides comprehensive insights.

      - The use of phospho-null and phospho-mimetic ubiquitin mutants elegantly demonstrates the dual effects of pUb.

      - Data on behavioral changes and cognitive impairments establish a clear link between cellular mechanisms and functional outcomes.

      Weaknesses:

      - While the study discusses the reciprocal relationship between proteasomal inhibition and pUb elevation, causality remains partially inferred.

      The reciprocal cycle between proteasomal inhibition and pUb elevation can be initiated by various factors that impair proteasomal activity. These factors include Aβ accumulation, ATP depletion, reduced expression of proteasome components, and covalent modifications of proteasomal subunits—all well-established contributors to the progressive decline in proteasome function. Once initiated, this cycle would become self-perpetuating, with the accumulation of sPINK1 and pUb driving a feedback loop of deteriorating proteasomal activity.

      In the current study, this reciprocal relationship between sPINK1/pUb elevation and proteasomal dysfunction is depicted in Figure 4A. Our results demonstrate that increased sPINK1 or PINK1 levels, such as through overexpression, can initiate this cycle. Crucially, co-expression of Ub/S65A effectively rescues the cells from this cycle, highlighting the pivotal role of pUb in driving proteasomal inhibition and establishing causality in this relationship. At the animal level, pink1 knockout could prevent protein aggregation upon aging and cerebral ischemia (Figures 1E and 1G).

      Mitochondrial injury is a likely source of elevated PINK1 and pUb levels. A recent study showed that efficient mitophagy is necessary to prevent pUb accumulation (bioRxiv 2023.02.14.528378), suggesting that mitochondrial damage can trigger this cycle. In another study (bioRxiv 2024.07.03.601901), the authors found that mitochondrial damage could enhance PINK1 transcription, further increasing cytoplasmic PINK1 levels and exacerbating the cycle.

      - The role of alternative pathways, such as autophagy, in compensating for proteasomal dysfunction is underexplored.

      Elevated sPINK1 has been reported to enhance autophagy (Autophagy 2016, 12: 632-647), potentially compensating for the impaired UPS. One mechanism involves the phosphorylation of p62 by sPINK1, which enhances autophagy activity. In our study, we did observe increased autophagic activity upon sPINK1* overexpression, as shown in Figure 2I (middle panel, without BALA). This increased autophagy may help degrade ubiquitinated proteins induced by puromycin, partially compensating for the proteasomal dysfunction.

      This compensation might explain why protein aggregation only increased slightly, though statistically significant, at 70 days post sPINK1* transfection (Figure 5F). Additionally, we observed a slight, though statistically insignificant, increase in LC3II levels in the hippocampus of mouse brains at 70 days post sPINK1* transfection (Figure 5—figure supplement 6), further supporting the notion of autophagy activation.

      However, while autophagy may provide some compensation, its effect is likely limited. Autophagy and UPS differ significantly in their roles and mechanisms of degradation. Autophagy is a bulk degradation pathway that is generally non-selective, targeting long-lived proteins, damaged organelles, and intracellular pathogens. In contrast, the UPS is highly selective, primarily degrading short-lived regulatory proteins, misfolded proteins, and proteins tagged for degradation.

      Together, we found that sPINK1* overexpression enhanced autophagy-mediated protein degradation while simultaneously impairing UPS-mediated degradation. This suggests that while autophagy may provide partial compensation for proteasomal dysfunction, it is not sufficient to fully counterbalance the selective degradation functions of the UPS.

      - The immunofluorescence images in Figure 1A-D lack clarity and transparency. It is not clear whether the images represent human brain tissue, mouse brain tissue, or cultured cells. Additionally, the DAPI staining is not well-defined, making it difficult to discern cell nuclei or staging. To address these issues, lower-magnification images that clearly show the brain region should be provided, along with improved DAPI staining for better visualization. Furthermore, the Results section and Figure legends should explicitly indicate which brain region is being presented. These concerns raise questions about the reliability of the reported pUb levels in AD, which is a critical aspect of the study's findings.

      We will include low-magnification images in the supplementary figures of the revised manuscript to provide a broader context for the immunofluorescence data presented in Figure 1. DAPI staining at higher magnifications will also be provided to improve visualization of cell nuclei and overall tissue structure. Additionally, we will indicate the brain regions examined in the corresponding figure legends, and incorporate more details in the Results section to provide clearer descriptions of the samples and brain regions analyzed.

      The human brain samples presented in Figure 1 are from the cingulate gyrus region of Alzheimer's disease (AD) patients. Our analysis revealed that PINK1 is primarily localized within cell bodies, while pUb is more abundant around Aβ plaques, likely in nerve terminals. These additional clarifications and supplementary figures should provide greater transparency and improve the reliability of our findings.

      - Figure 4B should also indicate which brain region is being presented.

      The images were taken for layer III-IV in the neocortex of mouse brains, which information will be incorporated in the figure legend of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

      Recommendations for the authors:

      Reviewing Editor:

      We recommend the authors remove the figure and section related to bnAb LN01, perform additional analysis (e.g., further expanding on the differences in antibody binding in the presence or absence of antigen), and present this as a separate manuscript in a follow-up study.

      We consider the analysis of a bnAb with a transmembrane antigen and of LN01 as essential to the manuscript and novel results.  Study of LN01 provides many insights unique from the other MPER bnAbs in this study.  We agree further characterization of LN01 and bnAbs with transmembrane antigen or full-length Env are intriguing and necessary to complete the full mechanistic understanding of lipid-associated antibodies.  LN01 section in this paper is novel in the field and demonstrates the preliminary evidence motivating further work, which we agree are beyond the scope of this already long detailed study.

      Reviewer #1 (Recommendations for the authors):

      I appreciate the degree to which the authors responded to my previous points raised in the private review, including edits where I might have missed something in the manuscript or relevant literature. I imagine such a point-by-point response was quite onerous. Thank you also for balancing presentation/clarity with content/rigor considering the large information content of this manuscript; in silico results are inherently hard to present given the delicate balance between rigorous validation and novel information content. I apologize if I repeat points raised and addressed previously and commend the authors on their revised study, which is much improved in clarity; any additional revisions are of course entirely at your discretion.

      "...now having more diversity in lipid headgroup chemistries" references the wrong figure-it should be: Figure 2-figure supplement 2A-C. The incorrect figure is also referenced again several sentences down: "...relevant CDR and framework surface loops..."

      Thank you for pointing out this error. We have corrected figure references.

      "One shared conformational difference observed for these bnAbs the higher cholesterol bilayers was slightly more extensive and broader interaction profiles as well as modestly deeper embedding of the relevant CDR and framework surfaces loops" please rephrase

      Thank you for this suggestion.  We rephrased this for improved clarity and flow. 

      "These results bolster the feasibility for using all-atom MD as an in silico platform to explore differential phospholipid affinity at these sites (i.e., specificity studies) and influence on antibody preferred conformation as membrane composition and lipid chemistry are systematically varied" Please tone down these speculations-you have demonstrated that simulations are robust to different headgroup chemistries but have not provided evidence for the exclusion of lipids that are known not to associate with these antibodies.

      We rephrased this speculation to highlight the potential of this application. We also emphasize future studies that would be required to achieve this application in the following sentence.

      “These results motivate use of all-atom MD as an in silico approach for exploring differential phospholipid affinity at these sites…”

      Figure 2A: Specify which PDB entry corresponds to the displayed crystal structures in the main figure or caption.

      We clarified these PDB entries in the figure caption. 

      Check reference formatting in supplemental figures when generating VOR.

      I am not sure how relevant this might be to the claims of Figure 2-figure supplement 3, but AlphaFold3-based phospholigand docking might provide an additional orthogonal approach if relevant ligand(s) are available for such analysis (particularly for the newly proposed 10E8 POPC complex).

      Thank you for this suggestion.  AI/ML based prediction methods like AF3 and RoseTTAFold All-Atom (RFAA) are interesting new methods that have come since our initial submission.   We’ve decided these experiments are beyond the scope of this already long and detailed study. We have added a sentence suggesting use of these methods in future work.

      "We next studied bnAb LN01 to interrogate differences" --> this transition still reads a bit unclear. Why shift gears and change antibodies? Also, while you do go into its interactions both +/- antigen, there's no lead into the simulation initialization with and without antigen to guide the reader into the comparisons you will draw in the figure. Also, the order of information presentation is a bit strange, where the rationale for choosing a single monomeric helix is brought up in the middle of the paragraph instead of at the beginning of the section. In the next paragraph, it goes back to the initialization of the membrane composition again, which feels a bit disorganized-I do appreciate the unique challenge of having to weave through so much quality data! In fact, if you were to conduct simulations of membrane + antigen vs. membrane + LN01 vs. membrane + LN01 + antigen, I am tempted to say that this could be removed from this manuscript and flow better as a paper in and of itself.

      We thank the reviewer for the suggestion to improve the writing style.  We feel this section adds a lot of value to the manuscript, so we will keep it in the paper and improved the transition as well as rationale.  

      We selected to study the additional antibody LN01 and the monomeric MPER-TM antigen conformation because of the existing structural evidence available without additional creative model building.  This rationale has been updated in the new text.  

      We changd the order of information as suggested, moving the rationale for antigen fragment earlier in the paragraph followed by the background of the lipids sites from the crystal that can lead into simulation set-up.  We clarified the simulation initialization was similar for systems with and without antigen in the opening sentence of the paragraph

      "previously observed snorkeling and hydration of TM Arg686" --> Is this R696 (numbering could be different based on the particular Env)?

      Thank you for noting this typo, we have corrected the numbering.

      Potential font color issue with Figure 3-Figure supplement 1 B and part of A text-could be fixed in typesetting.

      The discussion reads very well. Is it possible to direct antibody maturation, even in an engineered context, towards membrane affinity without increasing immunogenic polyreactivity? This is mentioned very briefly and cited with ref 36, but I would be interested in the author's thoughts on this topic.

      We thank the reviewer for the insightful idea to explore in future work.  Our conclusion alludes to possibly artificially evolving membrane affinity studied by MD, as done in vitro by Nieva and co-workers.  Because the hypothetical nature, we’ve chosen not to elaborate on those ideas from this manuscript.

      Reviewer #2 (Recommendations for the authors):

      To ensure reproducibility and facilitate further research, the authors should publicly deposit the code for running the MD simulations and analyses (e.g., on GitHub) along with the underlying data used in the study (e.g., on Zenodo.org).

      We appreciate the consideration for open-source code and analysis. Representative code and simulation trajectories were uploaded to the following repositories:

      https://github.com/cmaillie98/mper_bnAbs.git

      https://zenodo.org/records/13830877

      —-

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with a lipid composition similar to the native virion. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. Additional contacts and conformational restraints imposed by ectodomain regions of the envelope glycoprotein, however, remain unaddressed-the size of such simulations likely runs into technical limitations including sampling and compute time.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive. However, given the large amount of data presented within the manuscript, the text would benefit from clearer subsections that delineate discrete mechanistic discoveries, particularly for experimentalists interested in antibody discovery and design. One area the paper does not address involves the polyreactivity associated with membrane binding antibodies-MD simulations and/or pulling velocity experiments with model membranes of different compositions, with and without model antigens, would be needed. Finally, given the challenges in initializing these simulations and their limitations, the text regarding their generalized use for discovery, rather than mechanism, could be toned down.

      Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

      Reviewer #2 (Public Review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The conclusions from the paper are mostly well supported by the simulations, however, they remain very descriptive and the key findings should be better described and validated. In particular:

      It has been shown that the lipid composition of HIV membrane is rich in cholesterol [1], which accounts for almost 50% molar ratio. The authors use a very different composition and should therefore provide a reference. It has been shown for 4E10 that the change in lipid composition affects dynamics of the binding. The robustness of the results to changes of the lipid composition should also be reported.

      The real advantage of the multiscale approach (coarse grained (CG) simulation followed by a back-mapped all atom simulation) remains unclear. In most cases, the binding mode in the CG simulations seem to be an artifact.

      The results reported in this study should be better compared to available experimental data. For example how does the approach angle compare to cryo-EM structure of the bnAbs engaging with the MPER region, e.g. [2-3]? How do these results from this study compare to previous molecular dynamics studies, e.g.[4-5]?

      References<br /> (1) Brügger, Britta, et al. "The HIV lipidome: a raft with an unusual composition." Proceedings of the National Academy of Sciences 103.8 (2006): 2641-2646.<br /> (2) Rantalainen, Kimmo, et al. "HIV-1 envelope and MPER antibody structures in lipid assemblies." Cell Reports 31.4 (2020).<br /> (3) Yang, Shuang, et al. "Dynamic HIV-1 spike motion creates vulnerability for its membrane-bound tripod to antibody attack." Nature Communications 13.1 (2022): 6393.<br /> (4) Carravilla, Pablo, et al. "The bilayer collective properties govern the interaction of an HIV-1 antibody with the viral membrane." Biophysical Journal 118.1 (2020): 44-56.<br /> (5) Pinto, Dora, et al. "Structural basis for broad HIV-1 neutralization by the MPER-specific human broadly neutralizing antibody LN01." Cell host & microbe 26.5 (2019): 623-637.

      Considering reviewer suggestions, we slightly reorganized the results section into specific sub-sections with headings and changed the order in which key results were presented to allow the subsequent analysis more accessible for readers.  Supplemental materials were redistributed into eLife format, having each supplemental item grouped to a corresponding main figure. Many slightly detail modifications were made to figures (mostly supplemental items) without changing their character, such as clearer axes labels or revised annotations within panels.

      The major additions within the results sections based on the reviews were:

      (1) An expanded the comparison between our simulation analyses to previous simulations and to existing cryo-EM structural evidence for MPER antibodies’ membrane orientation the context of full-length antigen, resulting in new supplemental figure panels.

      (2) New atomistic simulations of 10E8, PGZL1, and 4E10 evaluating the phospholipid binding predictions in a different lipid composition more closely modeling HIV membranes.

      Minor edits to the analyses and interpretations include:

      (1) Outlining the geometric components contributing to variance in substates after clustering the atomistic 10E8, 4E10, and PGZL1 simulations.

      (2) Better defining the variance and durability of membrane interactions within and across systems in the coarse grain methods section.

      (3) Removed interpretations in the original results sections regarding polyreactivity and energetics for MPER bnAbs that were not explicitly supported by data.   

      (4) More context of the prevenance of bnAb loop geometries in structural informatics section

      (5) Rationale for the choice of the continuous helix MPER-TM conformation in LN01-antigen conformations, and citations to previous gp41 TM simulations.

      (6) Removed language on the novelty of the coarse grain and steered pulling simulations as newly developed approaches; tempering the potential discriminating power and applications of those approaches, in light of their limitations.

      The discussion was revised to provide more novel context of the results within the field, including discussing direct relevance of the simulation methods for evaluating immune tolerance mechanisms and into antibody engineering.   We have shared custom scripts used for molecular dynamics analysis on github (https://github.com/cmaillie98/mper_bnAbs.git) and uploaded trajectories to a public repository hosted on Zenodo (https://zenodo.org/records/13830877).

      Recommendations for the authors:

      Below, I provide an extensive list of minor edits associated with the text and figures for the authors to consider. I provide these with the hope of increasing the accessibility of the manuscript to broader audiences but leave changes to the discretion of the authors.

      Text/clarity

      Figure 1 main text

      The main text discussing Figure 1 is disorganized, making the analysis difficult to follow. I would suggest the following: moving the sentence, "4E10 and PG2L1 are structurally homologous" immediately after the paragraph discussing the simulation initiation. Then, add a sentence that directly compares their experimental affinity, neutralization, and polyreactivity of 4E10 and PG2L1 (later, an unintroduced idea pops up, "These patterns may in part explain 4E10's greater polyreactivity"). Next, lead into the discussion of the MD simulation data with something to the effect of: "Given these similarities, we first compared mechanisms of membrane insertion between 4E10 and PG2L1 to bolster confidence in our predictions". Later, the sentence "Across 4E10 and PGZL1 simulations, the bound lipid phosphates"

      We thank the reviewer for the suggestion and we have restructured the beginning of the results to implement this style: to first introduce then discuss the comparative PGZL1 & 4E10 results, i.e. Figure 1 plus associated supplements.

      In the background and the introduction text leading up to Figure 1, CDR-H3 is discussed at length, however, the first figure focuses almost entirely on how CDR-H1 coordinates a lipid phosphate headgroup. Are there experimental mutations in this loop that do not affect affinity (e.g., to a soluble gp41 peptide), but do affect neutralization (like the WAWA mutation for CDR-H3, discussed later)?

      We have altered the Introduction (para 2) and Results (4E10/PGZL1 sub-section) to give more balanced discussion of CDRs H1 & H3.  That includes referencing experimental data addressing the reviewer’s question; a PGZL1 clone H4K3 where mutations to CDRH1 were introduced and shown have minimal impact on affinity to MPER peptide via ELISA and BLI, but those mutant bnAbs had significantly reduced neutralization efficacy (PMC6879610).

      The sentence "These phospholipid binding events were highly stable, typically persisting for hundreds of nanoseconds" should be moved down to immediately precede, "[However], in a PGZL1 simulation, we observed a". This would be a good place for a paragraph break following, "Thus, these bnABs constitutively", since this block of text is very long.

      Similarly, the sentence and parts of the section, "Likewise, the interactions coordinating the lipid phosphate oxygens at CDR-H1" more appropriately belongs immediately before or after the sentence, "Our simulations uncover the CDR-lipid interactions that are the most feasible".

      Thank you for the detailed guidance in reorganizing the Figure 1 results.  We followed the advice to directly compare 4E10 and PGZL1 results separately from 10E8, moving those sections of text appropriately.  New paragraph breaks were added to improve accessibility and flow of concepts throughout the Results.

      In the sentence, "our simulations uncover CDR-lipid interactions that are the most feasible and biologically relevant in the context of a full [HIV] lipid bilayer... validation to which of the many possible ions" à have you confidently determined lipid binding and positioning outside of the site validated in figure 1? Which site(s) are these referencing? The next two sentences then introduce two new ideas on the loop backbone stability then lead into lipid exchange, which is a bit jarring.

      We have adjusted the language concerning the putative ions/lipids electron density across the many PGZL1 and 4E10 crystal structures, and additionally make the explicit point that we confidently determined the lack of lipid binding outside of the site focused on in Figure 1.

      “… both bnAbs showed strong hotspots for a lipid phosphate bound within the CDR-H1 loops, with minimal phospholipid or cholesterol ordering around the proteins elsewhere.  The simulated lipid phosphates bound within CDR-H1 have exceptional overlap with electron densities and atomic details of modelled headgroups from respective lipid-soaked co-crystal structures…”

      Figure 2 main text

      "We similarly investigated bnAb 10E8" - Please make this a separate subheader, the block text is very long up to this point.

      Thank you for the suggestion. We introduced a sub-header to separate work on 10E8 all-atom simulations.

      "we observed a POPC complexed with... modelled as headgroup phosphoglycerol anions..." - please cite the references within the text.

      Thank you for pointing out this missing reference, we added the appropriate reference.

      "One striking and novel observation" - please remove the phrase "striking" throughout, for following best practices in scientific writing (PMC10212555)-this is generally well-done throughout.

      We removed “striking” from our text per your suggestion.

      "This CDR-L1 site highlights... (>500 fold) across HIV strains" - How much do R29 and Y32 also contribute to antigen binding and the conformation of this loop? These mutants also decreased Kd by approximately 20X, and based on the co-crystal structure with the TM antigen (PDB: 4XCC), seem to play a more direct role in antigen contact. Additionally, these residues should be highlighted on a figure, otherwise it's difficult to understand why they are important for membrane association.

      We thank the reviewer for deep engagement to these supporting experimental details.  The R29A+Y32A 10E8 mutant referenced in the text showed only 4-fold Kd increase, a modest change for an SPR binding experiment.  Whereas R29E+Y32E 10E8 mutant resulted in 40x Kd increase, the “20x” the reviewer refers to.  Both 10E8 mutants showed similar drastically reduced breadth and potency of over 2 orders of magnitude on average.

      These mutated CDR-L1 residues are not directly involved in antigen contact and adopt the same loop helix conformation when antigen is bound.  A minor impact on antigen binding affinity could be due altering pre-organization of CDR loops upon losing interactions from the Tyr & Arg sidechains - particularly Tyr31 in contact with CDR-H3.

      As per the suggestion, clearer annotated figure panel denoting these sidechains has been added to Figure 2-Figure Supplement 1 for 10E8 analysis.

      "Structural searches querying... identified between 10^5 and 2*10^6..." - why is this value represented as such a large range? Does this depend on the parameters used for analysis? Please clarify.

      Additionally, how prevalent are any random loop conformations compared to the ones you searched? It's otherwise difficult to attribute number of occurrences within the 2 A cutoff to biological significance, as this number is not put in context.

      We appreciate the reviewers comment to contextualize the range and relative frequency of the bnAb loop conformations.   RMSD and length of loop are the key parameters, which can be controlled by searching reference loops of similar length.  The main point of the backbone-level searching is simply to imply the bnAb loops are not particularly rare when comparing loops of similar length.   

      We did as was suggested and added comparison to random loops of the same length to the main text, including a new Supplementary Table 4.   

      “…identified between 105 to 2∙106 geometrically similar sub-segments within natural proteins (<2 Å RMSD)40, reflecting they are relatively prevalent (not rare) in the protein universe, comparing well with frequency of other surface loops of similar length in antibodies (Supplementary Table 3).”

      "We next examined the geometries" could start after its own new subheading. Moreover, while there's an emphasis on tilt for neutralization, there is not a figure clearly modelling the proposed Env tilt compared to the relatively planar bilayer. It would be helpful to have an additional panel somewhere that shows the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for outlining an interesting element to consider in our analysis of a multi-step binding mechanism for MPER antibodies. We added additional figure panels in the supplement to outline the similarities and differences between our simulations and Fabs with the inferred membranes in cryo-EM experiments of full-length HIV Env.  The simulated Fabs’ angles are very similar with only minor tilting to match the cryo-EM antibody-membrane geometries. 

      We added Figure 1-figure supplement 1A & Figure 2-figure supplement 2A, and alter to text to reflect this:

      “The primary difference is Env-bound Fabs in cryo-EM adopt slightly more shallow approach angles (~15_°_) relative to the bilayer normal.  The simulated bnAbs in isolation prefer orientations slightly more upright, but presenting CDRs at approximately the same depth and orientation.  Thus, these bnAbs appear pre-disposed in their membrane surface conformations, needing only a minor tilt to form the membrane-antibody-antigen neutralization complex.”   

      Env tilt dynamics and membrane curvature of natural virions may reconcile some of these differences.  Recent in situ tomography of Full-length Env in pseudo-virions corroborates our approximation of flat bilayers over the short length scales around Env.

      The sentence "we next examined the geometries" mentions "potential energy cost, if any, for reorienting...". However, there's no further discussions of geometry or energy cost within this section. Please rephrase, or move this figure to main and increase discussion associated with the various conformational ensembles, their geometry, and their phospholipid association.

      As the reviewer highlights, the unbiased simulations and our analysis do not explicitly evaluate energetics.  We removed this phrase, and now only allude to the minimal energy barrier between the similar geometric conformations, relative to the tilting & access requirements for antigen binding mechanism.

      “The apparent barrier for re-orientation is likely much less energetically constraining than shielding glycans and accessibility of MPER”

      ".. describing the spectrum of surface-bound conformations" cites the wrong figure.

      Thank you for noticing this error; we correct the figure reference to (Figure 2-figure supplement 4).

      Please comment on the significance of how global clustering (Fig. S5A-C) was similar for 4E10 and PGZL1, but different for 10E8 (e.g., blue, orange, and yellow clusters for 4E10 and PHZL1 versus cyan, red, and green clusters for 10E8). As the cyan cluster seems to be much closer in Euclidian space to the 4E10/PGZL1 clusters, it might warrant additional analysis. What do these clusters represent in terms of structure/conformation? How do these clusters differ in membrane insertion as in (A)?

      We are grateful you identify analysis in the geometric clustering section that may be of interest to other readers. We have added additional supplementary table (Table 2) to detail the CDR loop membrane insertion and global Fab angles which describe each cluster, to demonstrate their similarities and differences.  We also better describe how global clustering was similar for 4E10 and PGZL1, but different for 10E8 in the relevant results section<br /> The cyan cluster is not close in structure to 4E10/PGZL1 clusters.  We note the original figure panel had an error.  The updated Figure 2-supplement 4B shows the correct Euclidian distance hierarchy with an early split between 4e10/pgzl1 and 10e8 clusters.

      Figure 3 main text

      The start of this section, "We next studied bnAb LN01...", is a good place for a new subheader.

      We have added an additional subheader here: Antigen influence on membrane bound conformations and lipid binding sites for LN01

      There should be a sentence in the main text defining the replicate setup and production MD run time. Is the apo and complex based on a published structure? How do you embed the MPER? Is the apo structure docked to membrane like in 4E10? The MD setup could also be better delineated within the methods.

      The first two paragraphs in this section have been updated to clarify the relevant simulations configuration and Fab membrane docking prediction details. 

      The procedure was the same for predicting an initial membrane insertion, albeit now we use the LN01-TM complex and the calculation will account for the membrane burial of the the TM domain and MPER fragment.  As mentioned, LN01 is predicted as inserted with CDR loops insert similarly with or without the TM-MPER fragment.  The geometry differs from PGZL1/4E10 and 10E8, denoted by the text.

      Please comment on the oligomerization state of the antigen used in the MD simulation: how does the simulation differ from a crossed MPER as observed in an MPER antibody-bound Env cryo-EM structure (PMID: 32348769), a three-helix bundle (PMC7210310), or single transmembrane helix (PMC6121722)? How does the model MPER monomer embed in the membrane compared to simulations with a trimeric MPER (PMC6035291, PMID: 33882664)-namely, key arginine residues such as R696?

      We thank the reviewer for pointing out critical underlying rationale for modeling this TM-MPER-LN01 complex which we have corrected in the revised draft. The range of potential conformations and display of MPER based on TM domain organization could easily be its own paper – we in fact have a manuscript in preparation on the topic.  

      The updated text expands the rationale for choosing the monomeric uninterrupted helix form of the MPER-TM model antigen (para 1 of LN01 section). The alternative conformations we did not to explore are called out, with references provided by the reviewer.

      The discussion qualified that the MPER presentation is likely oversimplified here, noting MPER display in the full-length Env trimer will vary in different conformational states or membrane environments. However, the only cryo-EM structures of full-length ENV with TM domains resolved have this continuous helix MPER-TM conformation – seen both within crossing TM dimers or dissociated TM monomers.

      Are there additional analyses that can validate the dynamics of the MPER monomer in the membrane and relative to LN01? Such as key contacts you would expect to maintain over the duration of the MD simulation?

      We also increased description of this TM domain’s behavior, dynamics (tilt, orientation, Arg696 snorkeling, and complex w LN01) to provide a clearer picture of the simulation results – which aligns with past MD of the gp41 TM domain as a monomer (para 2 of LN01 section).  As well, we noted key LN01-MPER contacts that were maintained.

      How does the model MPER modulate membrane properties like lipid density and lipid proximities near LN01?

      We checked and didn’t notice differences for the types of lipids (chol, etc) proximal to the MPER-TM or the CDR loops versus the bulk lipid bilayer distributions.  Due to the already long & detailed nature of this manuscript, we elect not to include discussion on this topic.

      Supplemental figure 1H-I would be better positioned as a figure 3-associated supplemental figure.

      We rearranged to follow the eLife format and have paired supplemental panels with their most relevant main figures.

      Figure 3F/H reference a "loading site" but this site is defined much later in the text, which was confusing.

      Thank you for pointing out this source of confusion, we rearranged our discussion to reflect the order in which we present data in figures.

      What evidence suggests that lipids "quickly exchange from the Loading site into the X-ray site by diffusion"? I do not gather this from Figure S1H/I.

      We have rearranged the loading side and x-ray site RMSD maps in Figure 3-Figure supplement 1 to better illustrate how a lipid exchanges between these sites.

      Figure 4 main text

      The authors assert that in the CG simulations, restraints, "[maintain] Fab tertiary and quaternary structure". However, backbone RMSD does not directly assert this claim-an additional analysis of the key interfacial residues between chains, or geometric analysis between the chains, would better support this claim.

      Thank you for pointing this point.  We rephrased to add that the major sidechain contacts between heavy and light chain persist, in addition to backbone RMSD, to describe how these Fabs maintain the fold stably in CG representation. 

      In several cases, CG models sample and then dissociate from the membrane. In the text, the authors mention, "course-grained models can distinguishing unfavorable and favorable membrane-bound conformations". Is there a particular orientation that causes/favors membrane association and dissociation? This analysis could look at conformations immediately preceding association and dissociation to give clues as to what orientation(s) favor each state.

      Thank you for suggesting this interesting analysis.  Clustering analysis of associated states are presented in Figure 5, Figure 5-Figure Supplement 1, and Figure 6, which show all CDR and framework loop directed insertion.  This feature is currently described in the main text.  

      We did not find strong correlation of specific orientations as “pre-dissociation” states or ineffective non-inserting “scanning” events.  We revised the key sentence to reflect the major take away – that non-CDR alternative conformations did not insert and most of those having CDRs inserted in a different manner than all-atom simulations also were prone to dissociate:

      “Given that non-CDR directed and alternative CDR-embedded orientations readily dissociate, we conclude that course-grained models can distinguish unfavorable and favorable membrane-bound conformations to an extent that provides utility for characterizing antibody-bilayer interaction mechanisms.”

      Figure 6 main text

      "For 4E10, trajectories initiated from all three geometries..." only two geometries are shown for each antibody. Please include all three on the plot.

      The plots include markers for all three geometries for 4E10, highlighted in stars or with letters on the density plots of angles sampled (Figure 6B,C)

      "Aligning a full-length IgG... unlikely that two Fabs simultaneously..." Are there theoretical conformations in which two Fabs could simultaneously associate with membrane? If this was physiological or could be designed rationally, could an antibody benefit further from avidity?

      Our modeling suggests the theoretical conformations having two Fabs on the membrane are infeasible.  It’s even less likely multiple Env antigens could be engaged by one IgG.  We have revised the text to express this more clearly.

      Figure 7 main text

      "An intermediate... showed a modest reduction in affinity..." what affinity does PGZL1 have for this antigen?

      The preceding sentence for this information: “Mature PGZL1 has relatively high affinity to the MPER epitope peptide (Kd = 10 nM) and demonstrates great breadth and potency, neutralizing 84% of a 130 strain panel “

      Figures

      Figure 1

      It would be helpful to have an additional panel at the top of this figure further zoomed out showing the orientation of the antibody (e.g., a representative pose) in the simulations relative to an appropriately curved membrane, Env, the binding conformation of the antibody to Env, and apo Env, given the tilting observed in PMID: 32348769 and theorized in PMC5338832. What additional conformational changes or tilting need to occur between the antibodies and Env to accomplish binding to their respective epitopes?

      Thank you for the suggestion to include this analysis.  We have added to the text reflecting this information, as well as making new supplemental panels for 4E10 and 10E8 that we compare simulated 4E10 and 10E8 Fab conformations to cryoEM density maps with Fabs bound to full-length HIV Env. Figure 1-figure supplement 1A & Figure 2-figure supplement 2A

      In Figure 1, space permitting, it would be helpful to annotate the distances between the phosphates and side chains (similarly, for Figure S1A).

      To avoid the overloading the Main figure panels with text, those relevant distances are listed in the methods sections.  Those distances are used to define the “bound” lipid phosphate state.  Generally, we note the interactions are within hydrogen bonding distance.

      Annotating "Replicate 1" and "Replicate 2" on the left side of Figure 1C/D would make this figure immediately intuitive.

      We have added these labels.

      Figure caption 1C: Please clarify the threshold/definition of a contact used to binarize "bound" versus "unbound" (for example, "mean distance cutoff of 2A between the phosphate oxygen and the COM of CDR-H1") [on further reading of the methods section, this criterion is quite involved and might benefit from: a sentence that includes "see methods"]. Additionally, C could use a sentence explaining the bar such as in E, "Phosphate binding is mapped to above each MD trajectory" Please define FR-H3 in the figure caption for E/F.

      We have added these details to the figure caption.

      Because Figure 1 is aggregated simulation time, it would be helpful to also represent the data as individual replicates or incorporate this information to calculate standard deviations/statistics (e.g., 1 microsecond max using the replicates to compute a standard deviation).

      We believe the current quantification & display of data via sharing all trajectories is sufficient to convey the major point for how often each CDR-phosholipid binding site it occupied.  Further tracking and statistics of inter-atomic distances will likely be too tedious & add minimal value. There is some dynamics of the phosphate oxygens between the polar within the CDR site but our “bound” state definitions sufficiently describe the key participating interactions are made.

      Figure 2

      For A, it would be helpful to annotate the yellow and blue mesh on the figure itself.

      We have defined the orange phosphate and blue choline densities.

      Also, where are R29 and Y32 relative to this site? In the X-ray panels, Y38 is not shown, and the box delineating the zoom-in is almost imperceptible.

      Thank you for this suggestion to include those amino acids which are referenced in the text as critical sites where mutation impacts function. To clarify, Y32 is the pdb numbering for residue Y38 in IMGT numbering. We have added a panel to Figure 2-Figure Supplement 1 having a cartoon graphic of 10E8 loop groove with sidechains & annotating R29 and Y38, staying consistent with out use of IMGT numbering in the manuscript.

      Figure 3

      It might read clearer to have "LN01+MPER-TM" and "LN01-Apo" in the middle of A/B and C/D, respectively, and a dotted line delineating the left and right side of the figure panels.

      We have added these details to the figure for clarity for readers.

      It would be helpful to show some critical interactions that are discussed in the text, such as the salt bridge with K31, by labeling these on the figure (e.g., in E-H).

      We drafted figure panels with dashed lines to indicate those key interactions.  However, they became almost imperceptible and overloaded with annotations that distracted from the overall details.  For K31, the interaction occurs in LN01 crystal structures readers can refer to.

      Why are axes cut off for J?

      We corrected this.

      Please re-define K/L plots as in Figure 1, and explain abbreviations.

      We updated the figure caption to reflect these changes.

      Figure 4

      The caption for panel A states that the Fab begins in solvent 1-2 nm above the bilayer, but the main text states 0.5-2 nm.

      We have reconciled this difference and listed the correct distances: 0.5-2nm.

      Please label the y-axis as "Replicate" for relevant figure panels so that they are more immediately interpretable.

      This label has been added.

      A legend with "membrane-associated" and "non-associated" within the figure would be helpful. Additionally, the average percent membrane associated, with a standard deviation, should be shown (Similar to 1C, albeit with the statistics).

      This legend has been added.  We also added the additional statistical metrics requested to strengthen our analysis.

      The text references "10, 14, and 12 extended insertion events" for the three antibody-based simulations. How do you define "extended insertion events"? Would breaking this into average insertion time and standard deviation better highlight the association differences between MPER antibodies and controls, in addition to the variability due to difference random initialization?

      We thank the reviewer for the insightful suggestion on how to better organize quantitative analysis to support the method. Supplemental Table 3 includes these numbers.

      Figure 5

      The analysis in Fig. S6C could be included here as a main figure.

      The drafted revised figure adding S6C to Figure 5 made for too much information.  Likewise, putting this panel S6C separated it from the parent clustering data of S6B, so we decided to keep these figures separated.  The S6 figure is now Figure 5-figure supplement 1.

      Figure 6

      Please annotate membrane insertion on E as %.

      These are phosphate binding RMSD/occupancy vs time.  The panels are now too small to annotate by %.  The qualitative presentation is sufficient at this stage.  The quantitative % are listed in-line within text when relevant to support assertions made. 

      Please use the figure caption to explain why certain clusters (e.g., 10E8 cluster A, artifact, Fig. S6E) are not included in panel E.

      We have added this information in the figure caption.

      Figure 7

      Please show all points on the box and whisker plots (panels E and F), and perform appropriate statistical tests to see if means are significantly different (these are mentioned in the text, but should be annotated on the graph and mentioned within the figure caption).

      We have changed these plots to show all data points along with relevant statistical comparisons. The figure captions describe unpaired t-test statistical tests used.

      Figure S1

      G, H, and I do not belong here-they should be moved to accompany their relevant text section, which associates with Figure 3. It would be helpful to associate this with Figure 3 in the eLife format, "Figure 3-Supplemental Figure 1" or its equivalent.

      It's very difficult to distinguish the green and blue circles on panel G.

      We darkened the shading and added outline for better visualization

      Subfigure I is missing a caption, could be included with H: "(H,I) Additional replicates for LN01+TM (H) and LN01 (I)".

      We corrected this as suggested.

      Why is H only 3 simulations and not 4? Does it not have a lipid in the x-ray site? Also, the caption states "(top, green)" and "(bottom, cyan)", but the green vs. cyan figures are organized on the left and right. Additional labels within the figure would help make this more intuitive.

      If the point of H and I is to illustrate that POPC exchanges between the X-ray and loading sites, this is unclear from the figure. Consider clarifying these figures.

      Thank you for describing the confusion in this figure, we have added labels to clarify.

      Figure S2 (panels split between revised Figure 4 associated figure supplements)

      The LN01 figures should likely follow later so that they can associate with Figure 3, despite being a similar analysis.

      We corrected supplements to eLife format so supplements are associated with relevant main figures.

      Figure S3 (panels split between revised Figure 1 & 2 associated figure supplements)

      As hydrophobicity is discussed as a driving factor for residue insertion, it would be helpful to have a rolling hydrophobicity chart underneath each plot to make this claim obvious.

      We prefer the current format, due to the worry of having too much information in these already data-rich panels.  As well, residues are not apolar but are deeply inserted.

      Figure S4 (panels split between revised Figure 1 & 2 associated figure supplements)

      It would be helpful to label the relevant loops on these figures.

      We have labeled loops for clarity.

      Do any of these loops have minor contacts with Env in the structure?

      The 4E10 and PGZL1 CDRH-1 loop does not directly contact bound MPER peptides bound in crystal structures. 

      FRL-3 and CDR-H1 in 10E8 do not contact the MPER peptide antigen component based on x-ray crystal structures.

      Do motif contacts with lipid involve minor contacts with additional loops other than those displayed in this figure?

      The phosphate-loop interactions in motifs used as query bait here are mediated solely by the backbone and side chain interactions of the loops displayed. We visually inspected most matches and did not see any “consensus” additional peripheral interactions common across each potential instance in the unrelated proteins.  The supplied Supplemental Table 2 contains the information if a reader wanted to conduct a detailed search. 

      Why is there such a difference between the loop conformation adopted in the X-ray structure and that in the MD simulation, and why does this lead to the large observed differences in ligand-binding structure matches?

      We thank the reviewer for carefully noting our error in labeling of CDR loop and framework region input queries. We revised the labeling to clarify the issue.

      The is minimal structural difference between the loops in x-ray and MD.

      Figure S5 (Figure 2-Figure supplement 4)

      This figure is not colorblind friendly-it would be helpful to change to such a pallet as the data are interesting, but uninterpretable to some.

      We have left this figure the same.

      "Susbstates" - "Substates"

      Corrected, thank you.

      Panel B is uninterpretable-please break the axis so that the Euclidian distances can be represented accurately but the histograms can be interpreted.

      We have adjusted axis for this plot to better illustrate the cluster thresholds.

      The clusters in D-H should be analyzed in greater depth. What is the structural relevance of these clusters other than differences in phospholipid occupancy in (I)? Snapshots of representative poses for each cluster could help clarify these differences.

      We have adjusted the text to describe the geometric differences in each of those clusters that result in the different exceptionally lower propensities for forming the key phospholipid interaction.  

      The figure caption should make it clear that 3 μS of aggregate simulation time is being used here instead of 4 μS to start with unique tilt initializations. E.g., "unique starting membrane-bound conformations (0 degrees, -15 degrees, 15 degrees initialization relative to the docked pose)". Further, why was the particular 0-degree replicate chosen while the other was thrown out? Or was this information averaged? Why is the full 4 μS then used for D-I?

      We thank the reviewer for noting these details.  We didn’t want to bias the differential between 10E8 and 4E10/PGZL1 by including the replicate simulations.  The analysis was mainly intended to achieve more coarse resolution distinction between 10E8 and the similar PGZL1/4E10.  

      In the subsequent clustering of individual bnAb simulation groups, the replicate 0 degree simulations had sufficiently different geometric sampling and unique lipid binding behavior that we though it should be used (4 us total) to achieve finer conformational resolution for each bnAb.

      Figure S6 (now Figure 5-Figure Supplement 1)

      Please label the CDRs in C and provide a color key like in other figures. Also, please label the y-axes. This figure could move to main below 5B with the clusters "A,B,C" labeled on 5B.

      We have added the axes labels and color key legend.  We retained a minimal CDR loop labeling scheme for the more throughput interaction profiles here where colored sections in the residue axes denote CDR loop regions.

      Figure S7 (Figure 7 Figure Supplement 1)

      Panels A and B would likely read better if swapped.

      We have swapped these panels for a better flow.

      For panel C, please display mean and standard deviation, and compare these values with an appropriate statistical test.

      This is already displayed in main figure, we have removed it from supplement.

      For E and F, please clarify from which trajectory(s) you are extracting this conformation from. Are these the global mean/representative poses? How do they compare to other geometrically distinct clusters?

      The requested information was added to supplemental figure caption.  These are frames from 2 distinct time points selected phosphate bound frames from 0-degree tilt replicates for both 4E10 and 10E8, representing at least 2 distinct macroscopic substates differing in global light chain and heavy chain orientation towards the membrane. 

      Table S2 (now Supplementary Table 3)

      Please add details for the 13h11 simulation.

      Additionally, please add average contact time and their standard deviation to the table, rather than just the aggregated total time. This will highlight the variability associated with the random initializations of each simulation.

      We have added the details for 13h11 and the requested analysis (average aggregated time +/- standard deviation and average time per association event +- standard deviation) to supplement our summary statistics for this method.

      Reviewer #2 (Recommendations For The Authors):

      (1) The structure of the manuscript should be improved. For example, almost half of the introduction (three paragraphs) summarize the results. I found it hard to navigate all the data and specific interactions described in the result section. Furthermore, the claims at the end of several sections seem unsupported. Especially for the generalization of the approach. This should be moved to the discussion section. The discussion is pretty general and does not provide much context to the results presented in this study.

      We have significantly reorganized the results section to improve the flow of the manuscript and accessibility for readers, especially the first sections of all-atom simulations. We also removed claims not directly supported by data from our results, and expanded on some of these concepts in the discussion to make some more novel context to the result.

      (2) The author should cite more rigorously previous work and refrain from using the term "develop" to describe the simple use of a well established method. E.g. Several studies have investigated membrane protein interactions e.g. [1], membrane protein-bilayer self-assembly [2], steered molecular dynamics [3], etc.

      Thank you for identifying relevant work for the simulations that set precedent for our novel application to antibody-membrane interactions.  We have removed language about development of simulation methods from the text and now better reference the precedent simulation methods used here.

      (3) Have the authors considered estimating the PMF by combining the steered MD simulation through the application of Jarzynski's equality?

      We performed from preliminary PMFs for Fab-membrane binding, but saw it was taking upward of 40 us to reach convergence.  Steered simulations focus on a key lipid may be easier.

      Although PMFs are beyond the scope of this work, we added proposals & allusion to their utility as the next steps for more rigorous quantification of fab-membrane interactions.

      Minor

      (4) The term "integrative modeling" is usually used for computational pipelines which incorporate experimental data. Multiscale modeling would be more appropriate for this study.

      We altered descriptions throughout the manuscript to reflect this comment.

      (5) Units to report the force in the steered molecular dynamics are incorrect. They should be 98.

      We changed axes and results to correctly report this unit.

      (6) Labels for axes of several graphs are not missing.

      We added labels to all axes of graphs, except for a few where stacked labels can be easily interpreted to save space and reduce complexity in figures.

      (7) Figure 3 K & L is this really < 1% of total? The term "total" should also be clarified.

      Thank you for pointing this out, we changed the % labels to be correct with axes from 0-100%. We clarified total in the figure caption.

      (8) The font size in figures should be uniformized.

      This suggestion has been applied

      (9) Time needed for steered MD should be reported in CPUh and not hours (page 17).

      We removed comments on explicit time measurements for our simulations.

      (10) Version of Martini force field is missing in methods section

      We used Martini 2.6 and added this to the methods.

      References

      (1) Prunotto, Alessio, et al. "Molecular bases of the membrane association mechanism potentiating antibiotic resistance by New Delhi metallo-β-lactamase 1." ACS infectious diseases 6.10 (2020): 2719-2731.

      (2) Scott, Kathryn A., et al. "Coarse-grained MD simulations of membrane protein-bilayer self-assembly." Structure 16.4 (2008): 621-630.

      (3) Izrailev, S., et al. "Computational molecular dynamics: challenges, methods, ideas. Chapter 1. Steered molecular dynamics." (1997).

    2. eLife Assessment

      This valuable study reports multi-scale molecular dynamics simulations to investigate a class of highly potent antibodies that simultaneously engage with the HIV-1 Envelope trimer and the viral membrane. The work provides insights into how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization. After extensive revision, the level of evidence is considered solid, although a quantitative assessment of the underlying energetics remain difficult to obtain.

    3. Reviewer #1 (Public review):

      Previous experimental studies demonstrated that membrane association drives avidity for several potent broadly HIV-neutralizing antibodies and its loss dramatically reduces neutralization. In this study, the authors present a tour de force analysis of molecular dynamics (MD) simulations that demonstrate how several HIV-neutralizing membrane-proximal external region (MPER)-targeting antibodies associate with a model lipid bilayer.

      First, the authors compared how three MPER antibodies, 4E10, PGZL1, and 10E8, associated with model membranes, constructed with two lipid compositions similar to native viral membranes. They found that the related antibodies 4E10 and PGZL1 strongly associate with a phospholipid near heavy chain loop 1, consistent with prior crystallographic studies. They also discovered that a previously unappreciated framework region between loops 2-3 in the 4E10/PGZL1 heavy chain contributes to membrane association. Simulations of 10E8, an antibody from a different lineage, revealed several differences from published X-ray structures. Namely, a phosphatidylcholine binding site was offset and includes significant interaction with a nearby framework region. The revised manuscript demonstrates that these lipid interactions are robust to alterations in membrane composition and rigidity. However, it does not address the reverse-that phospholipids known experimentally not to associate with these antibodies (if any such lipids exist) also fail to interact in MD simulations.

      Next, the authors simulate another MPER-targeting antibody, LN01, with a model HIV membrane either containing or missing an MPER antigen fragment within. Of note, LN01 inserts more deeply into the membrane when the MPER antigen is present, supporting an energy balance between the lowest energy conformations of LN01, MPER, and the complex. These simulations recapitulate lipid binding interactions solved in published crystallographic studies but also lead to the discovery of a novel lipid binding site the authors term the "Loading Site", which could guide future experiments with this antibody.

      The authors next established course-grained (CG) MD simulations of the various antibodies with model membranes to study membrane embedding. These simulations facilitated greater sampling of different initial antibody geometries relative to membrane. These CG simulations , which cannot resolve atomistic interactions, are nonetheless compelling because negative controls (ab 13h11, BSA) that should not associate with membrane indeed sample significantly less membrane.

      Distinct geometries derived from CG simulations were then used to initialize all-atom MD simulations to study insertion in finer detail (e.g., phospholipid association), which largely recapitulate their earlier results, albeit with more unbiased sampling. The multiscale model of an initial CG study with broad geometric sampling, followed by all-atom MD, provides a generalized framework for such simulations.

      Finally, the authors construct velocity pulling simulations to estimate the energetics of antibody membrane embedding. Using the multiscale modelling workflow to achieve greater geometric sampling, they demonstrate that their model reliably predicts lower association energetics for known mutations in 4E10 that disrupt lipid binding. However, the model does have limitations: namely, its ability to predict more subtle changes along a lineage-intermediate mutations that reduce lipid binding are indistinguishable from mutations that completely ablate lipid association. Thus, while large/binary differences in lipid affinity might be predictable, the use of this method as a generative model are likely more limited.

      The MD simulations conducted throughout are rigorous and the analysis are extensive, creative, and biologically inspired. Overall, these analyses provide an important mechanistic characterization of how broadly neutralizing antibodies associate with lipids proximal to membrane-associated epitopes to drive neutralization.

    4. Reviewer #2 (Public review):

      In this study, Maillie et al. have carried out a set of multiscale molecular dynamics simulations to investigate the interactions between the viral membrane and four broadly neutralizing antibodies that target the membrane proximal exposed region (MPER) of the HIV-1 envelope trimer. The simulation recapitulated in several cases the binding sites of lipid head groups that were observed experimentally by X-ray crystallography, as well as some new binding sites. These binding sites were further validated using a structural bioinformatics approach. Finally, steered molecular dynamics was used to measure the binding strength between the membrane and variants of the 4E10 and PGZL1 antibodies.

      The use of multiscale MD simulations allows for a detailed exploration of the system at different time and length scales. The combination of MD simulations and structural bioinformatics provides a comprehensive approach to validate the identified binding sites. Finally, the steered MD simulations offer quantitative insights into the binding strength between the membrane and bnAbs.

      While the simulations and analyses provide qualitative insights into the binding interactions, they do not offer a quantitative assessment of energetics. The coarse-grained simulations exhibit artifacts and thus require careful analysis.

      This study contributes to a deeper understanding of the molecular mechanisms underlying bnAb recognition of the HIV-1 envelope. The insights gained from this work could inform the design of more potent and broadly neutralizing antibodies.

    1. eLife Assessment

      This valuable study presents findings on the mode of action of MOTS-c (mitochondrial open reading frame from the twelve S rRNA type-c), and its impact on monocyte-derived macrophages. The authors present solid evidence for its increased expression in stimulated monocytes/macrophages, its direct bactericidal functions, as well as its role in the modulation of monocyte differentiation into macrophages. Since most of the data were generated from a cell line (THP1), future work is required to validate observations in primary cells and to further support the claims of this work.

    2. Reviewer #1 (Public review):

      In this work, the authors examine the mechanism of action of MOTS-c and its impact on monocyte-derived macrophages. In the first part of the study, they show that MOTS-c acts as a host defense peptide with direct antibacterial activity. In the second part of the study, the authors aim to demonstrate that MOTS-c influences monocyte differentiation into macrophages via transcriptional regulation.

      Major strengths. Methods used to study the bactericidal activity of MOTS-c are appropriate and the results convincing.

      Major weaknesses. Methods used to study the impact on monocyte differentiation are inappropriate and the conclusions not fully supported by the data shown. A major issue is the use of the THP-1 cell line, a transformed monocytic line which does not mimic physiological monocyte biology. In particular, THP-1 differentiation is induced by PMA, which is a completely artificial system and conclusions from this approach cannot be generalized to monocyte differentiation. The authors would need to perform this series of experiments using freshly isolated monocytes, either from mouse or human. The read-out used for macrophage differentiation (adherence to plastic) is also not very robust, and the authors would need to analyze other parameters such as cell surface markers. It is also not clear whether MOTS-c could act in a cell-intrinsic fashion, as the authors have exposed cells to exogenous MOTS-c in all their experiments. The authors have also analyzed the transcriptomic changes induced by MOTS-c exposure in macrophages derived from young or old mice. While the results are potentially interesting, the differences observed seem independent from MOTS-c and mainly related to age, therefore the conclusions from this figure are not clear. The physiological relevance of this study is also unclear.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work, the authors examine the mechanism of action of MOTS-c and its impact on monocyte-derived macrophages. In the first part of the study, they show that MOTS-c acts as a host defense peptide with direct antibacterial activity. In the second part of the study, the authors aim to demonstrate that MOTS-c influences monocyte differentiation into macrophages via transcriptional regulation.

      Major strengths.

      Methods used to study the bactericidal activity of MOTS-c are appropriate and the results are convincing.

      Major weaknesses.

      Methods used to study the impact on monocyte differentiation are inappropriate and the conclusions are not supported by the data shown. A major issue is the use of the THP-1 cell line, a transformed monocytic line which does not mimic physiological monocyte biology. In particular, THP-1 differentiation is induced by PMA, which is a completely artificial system and conclusions from this approach cannot be generalized to monocyte differentiation. The authors would need to perform this series of experiments using freshly isolated monocytes, either from mouse or human. The read-out used for macrophage differentiation (adherence to plastic) is also not very robust, and the authors would need to analyze other parameters such as cell surface markers. It is also not clear whether MOTS-c could act in a cell-intrinsic fashion, as the authors have exposed cells to exogenous MOTS-c in all their experiments. The authors did not perform complementary experiments using MOTS-c deficient monocytes. The authors have also analyzed the transcriptomic changes induced by MOTS-c exposure in macrophages derived from young or old mice. While the results are potentially interesting, the differences observed seem independent from MOTS-c and mainly related to age, therefore the conclusions from this figure are not clear. Another concern is the reproducibility of the experiments, as the authors do not indicate the number of biological replicates analyzed nor the number of independent experiments performed.

      In this study, we employed the THP-1 cell line as a proof-of-principle to elucidate the existence of a firstin-class mitochondrial-encoded host defense peptide. This peptide is expressed in monocytes and serves dual functions: i) direct targeting of bacteria, and ii) regulation of monocyte differentiation. It is noteworthy that THP-1 cells differentiated by PMA have been widely utilized as a model for monocyte differentiation by numerous research groups.  While we acknowledge the significance of utilizing primary monocytes to fully comprehend the translational implications of our findings, conducting a complete replication of our experiments in primary monocytes falls beyond the scope of this study. However, we have conducted several pivotal experiments in primary monocytes, including:  

      i) Demonstration of the induction of endogenous MOTS-c in primary human monocytes during differentiation by M-CSF (Fig 3A).

      ii) Observation of an increased number of adhered monocytes during monocyte differentiation following MOTS-c treatment (Fig 5A).

      iii) Examination of the transcriptional regulation in mouse primary bone marrow-derived macrophages (BMDMs) by MOTS-c, seven days after a single treatment at the onset of differentiation (Fig 6).

      In addition to assessing adherence to plastic, we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c as a measure of accelerated differentiation (Fig 4). The positive correlation between the effects of PMA and PMA+MOTS-c suggests that MOTS-c accelerates the transcriptional changes that occur during differentiation (Fig 4G). We consider this method a more comprehensive evaluation of differentiation as it encompasses the expression of thousands of genes rather than relying on a limited selection of cell surface markers. Future investigations should explore additional indicators of differentiation, including potential epigenetic effects of MOTS-c.

      Our findings indicate that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), implying a cell-intrinsic role for MOTS-c during monocyte differentiation. Although examining MOTS-c deficient monocytes would offer valuable insights, technical limitations currently hinder the production of such monocytes due to the mitochondrial genomic encoding of MOTSc within the 12S rRNA.

      Furthermore, our study reveals that MOTS-c alters gene expression in macrophages similarly across age and sex groups. This observation, illustrated in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were consistent across all groups, suggests that MOTS-c modulates macrophage gene expression in an age-related manner. We postulate this to be an adaptive response to age-related alterations in the monocyte and macrophage microenvironment.

      The number of biological replicates performed for each experiment is indicated.

      The different parts of the manuscript do not appear well connected and it is not clear what the main message from the manuscript would be. The physiological relevance of this study is also unclear.

      The main message of our manuscript is that the mitochondrial genome encodes for a previously unknown host defense peptide that has physiological roles in modulating immune responses during infection and during aging. We have edited the ‘introduction’ to clarify this.

      Reviewer #2 (Public Review):

      The research study presented by Rice et al. set out to further profile the host defense properties of the mitochondrial protein MOTS-c. To do this they studied i. the potential antimicrobial effects of MOTS-c on common bacterial pathogens E.coli and MRSA, ii. the effects of MOTS-c on the stimulation and differentiation of monocytes into macrophages. This is a well performed study that utilizes relevant methods and cell types to base their conclusions on. However, there appear to be a few weaknesses to the current study that hold it back from more broad application.

      Comment 1: From reading the manuscript methods and results, it is unclear exactly what the synthetic MOTS-c source is. Therefore it is hard to determine whether there may be any impurities in the production of this synthetic protein that may interfere with the results presented throughout the manuscript. Though, the data presented in Supplemental Figure 4F, where E.coli expressing intracellular MOTS-c inhibited bacterial growth certainly support MOTS-c specific effects. Similarly with the experiments showing endogenous MOTS-c levels rising in stimulation and differentiated macrophages (Figure 3).

      We have edited our manuscript to include the source and purity of our synthetic MOTS-c peptide. The MOTS-c peptide used was synthesized by New England Peptides (now Biosynth) with a purity >95% by mass spectrometry.

      Comment 2: It is interesting that the mice receiving bacteria coupled with MOTS-c lost about 10% of their body weight. It would have been interesting to demonstrate the cause of this weight loss since the effect appears to be separate from mere PAMPs as shown by using heat-killed MRSA in Supplemental Figure 5. Was inflammation changed? Is this due to changes in systemic metabolism? Would have been interesting to have seen CRP levels or circulating liver enzymes.

      As suggested, we repeated this experiment to include both the heat-killed and MOTS-c-MRSA groups in the same controlled experiment for comparison (Fig 2; see below). Blood was collected from these mice for evaluation of cytokine levels and markers of organ damage. While only 1/6 controls survived, all MOTSc and heat-killed MRSA-treated mice survived. However, compared to the heat-killed group, the MOTS-cMRSA group lost more weight and had a higher inflammatory profile, but still significantly less than in the control group. We hypothesize that this is due to only partial killing of MRSA by MOTS-c, as suggested by the CFU plated after overnight incubation, leading to a non-lethal infection in these mice. Others have shown that in this peritonitis model, α-hemolysin production by live MRSA is a key factor in toxicity, rather than PAMP-induced shock (PMID: 8975909; 22802349), which is consistent with the absence of death following heat-killed MRSA inoculation.

      Despite these concerns, the data are well suited to answering their research question, and they open up the door to studying how mitochondrial peptides like MOTS-c could have roles outside of the mitochondria.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improvement

      (1) The authors need to indicate in each legend the number of biological replicates analyzed and the number of independent experiments performed. This is essential.

      We have included the number of biological replicates analyzed.

      (2) The authors need to repeat the key experiments using freshly isolated monocytes, either human or mouse. THP-1 cells are abnormal cells and findings from these cells cannot be generalized to monocytes. For instance, in Figures 3A and B, it is clear that the kinetics of MOTS-c expression are different between THP-1 cells and human blood monocytes.

      The kinetics of THP-1 cells compared to human monocytes are slightly different, as expected by using different cells and different differentiation cues (M-CSF vs PMA). However, our findings collectively demonstrate the same effect, that each stimulus transiently induces the expression of MOTS-c within 24 hours in monocytes.

      In Figure 3A, the authors should show what happens in the absence of MCSF. Is MOTS-c expression upregulated by culture alone?

      There is some degree of baseline expression of MOTS-c in a resting state, and MOTS-c expression is significantly increased upon stimulation. This expression may be higher in primary monocytes than THP-1 cells, given that these monocytes are inevitably stressed by being removed from the native environment and put through the purification process.

      (3) In Figure 4A, a control for cytoplasmic contamination in the nuclear fraction is missing.

      We now include GAPDH detection in the nuclear fraction.  

      Author response image 1.

      (4) The RNA-seq analysis shown in Figure 4 is not very informative. What genes are differentially expressed? The authors should provide a list of these genes as supplementary information and highlight some key genes in the figure and text.

      The complete list of these genes is provided in Tables S1 and S2. We chose not to highlight specific genes in this paper due to the lack of sufficient evidence identifying any particular genes as key factors at this time.

      (5) In Figure 5A, a control is missing: the authors should treat the monocytes with the same volume of 'vehicle' (presumably it is water).

      In all experiments with MOTS-c treatment, the controls were treated with the same volume of vehicle (water). We have edited legends to state this.

      (6) In Figure 6, the differences observed seem independent on MOTS-c. The conclusions from this figure are overstated and need to be rephrased and clarified.

      MOTS-c shifted gene expression in macrophages in a similar manner regardless of age and sex, as shown in Fig 6E where the fold changes in clusters 5 and 6 in response to MOTS-c were similar in all groups. Independently, aging alone increases the expression of these same genes related to antigen presentation and interferon signaling, suggesting that MOTS-c shifts macrophage gene expression in an age-related manner – the expression of antigen presentation and interferon-related genes have been shown to be highly age-related (PMID: 36040389, 32669714, 36622281, 31754020). We hypothesize this to be an adaptive response to age-related changes in the monocyte and macrophage microenvironment.

      (7) Adherence to plastic is not a robust read-out for monocyte differentiation into macrophages. The authors need to examine other parameters, for instance characteristic cell surface markers for macrophages.

      As a read-out of accelerated differentiation, in addition to adherence to plastic we performed RNA-seq of THP-1 cells during early differentiation with MOTS-c (Fig 4). The positive correlation between the effects of PMA and effects of PMA+MOTS-c suggest MOTS-c is accelerating the transcriptional changes that occur during differentiation (Fig 4G). We believe this to be a more robust assessment of differentiation as it relies on the expression of thousands of genes rather than a limited selection of cell surface markers. Further studies are needed to assess other read-outs of differentiation, including possible epigenetic effects of MOTS-c.

      (8) It is not clear whether MOTS-c could have a cell-intrinsic effect in monocytes. The results should be strengthened by examining the differentiation of monocytes deficient for MOTS-c (without addition of exogenous MOTS-c).

      We have shown that endogenous MOTS-c is induced during monocyte stimulation and translocates into the nucleus (Figs 3-4), suggesting that MOTS-c does have a cell-intrinsic role during monocyte differentiation.

      While having MOTS-c deficient monocytes would certainly be insightful, because MOTS-c is encoded within the mitochondrial genome in the 12S rRNA there are currently technical limitations in producing these monocytes.

      Other points

      (1) The paper would benefit from a more extended discussion to understand the physiological relevance of these findings. What cells would release MOTS-c in vivo, and how would that affect monocytes ? Is there a cell-intrinsic of MOTS-c in monocytes, and if so what would be the signals inducing its expression during differentiation ? These aspects should be discussed by the authors so that the readers can understand their views.

      We thank the reviewer for their suggestion and have edited the discussion in our revised manuscript.  

      MOTS-c has been detected in various tissue and cell types, including the liver, muscle, T cells, monocytes/macrophages, and epithelial cells. This aligns with MOTS-c being referred to in literature as a cytokine, which are typically expressed by a broad range of cell types. Consistent with this, we also propose that MOTS-c would be expressed in cells known to express HDPs.

      We hypothesize that MOTS-c acts in both a cell-intrinsic and extrinsic manner in vivo, consistent with known HDPs, to both target bacteria directly and modulate immune cell responses. In vitro, M-CSF, PMA, LPS, and IFNγ each induced MOTS-c expression. In vivo, monocytes respond to a range of stimuli that influence their differentiation, and these stimuli may induce MOTS-c as well. We have previously published that MOTS-c acts primarily under conditions of cell stress, such as nutrient deprivation and oxidative stress, to help restore homeostasis. While MOTS-c did regulate macrophage gene expression in resting “M0-like” macrophages, we hypothesize that the physiological role of MOTS-c is to regulate cell adaptation to stress, therefore the context under which monocytes differentiate will be an important factor determining the functional effects of MOTS-c. In future studies, we plan to test whether the immuno-modulatory effects of MOTS-c are dependent on the environment during differentiation.

      (2) Scale bar appear to be missing from Figure 1G.

      We apologize for the poor resolution of the scale bar. We have made it easily recognizable in the revised figure.  

      (3) It is not very clear what is shown in Figure S2. The authors should better explain what the images represent.

      Figure S2 is related to Figure 1D and Figure S1. In this experiment, E. coli, S. typhimurium, and P. aeruginosa cultures were treated with MOTS-c (100uM). We observed that only E. coli aggregated immediately, while

      S. typhimurium and P. aeruginosa did not show aggregation. This suggests that MOTS-c exhibits specificity in targeting certain types of bacteria, although the underlying basis of this specificity is currently unknown.  

      We have revised the legend as follows: 'MOTS-c exhibits specificity in bacterial targeting. MOTS-c (100 μM) treatment causes immediate aggregation of E. coli but not S. typhimurium or P. aeruginosa (n=6). Representative image shown. See Figure 1D'.

      Reviewer #2 (Recommendations For The Authors):

      This is a beautifully executed study and a well written manuscript. I generally don't have much critical feedback to give based on my reading. The only recommendation I have to improve the completeness of the data would be in relation to Figure 5E and F. The metabolic phenotype of LPS stimulated monocytes/macrophages is more typically the Warburg effect where oxidative phosphorylation is reduced (as you show with a lowered OCR), but with a concomitant elevation in lactate production. It would have been nice to see either i. the ECAR levels from your seahorse data, or ii. separate lactate measurements on your supernatants. This would go a long way to further explaining the phenotype described in the figure.

      We greatly appreciate the reviewer's positive feedback. The data provided below are ECAR measurements obtained from the Seahorse assay. However, it's important to note that the assays were originally designed for OCR measurement (e.g. buffered media unsuitable for ECAR measurements, use of mitochondrial complex inhibitors, etc.), thus rendering the ECAR data unreliable for accurately assessing glycolysis. Consequently, while we share this data with the reviewer, we believe it is inappropriate to include it in the manuscript (hence omitted in the original submission).

      Author response image 2.

      Furthermore, we are currently engaged in a separate manuscript focusing on elucidating the immunometabolic mechanisms of MOTS-c in macrophages. We intend for this manuscript to stand alone, providing a comprehensive exploration of metabolic pathways, including a detailed untargeted metabolomics map spanning multiple time-points.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods.

      Thank you for the comments. We appreciate the multiple rounds of revision the manuscript has undergone and the work has improved as a consequence. Overall we disagree that the patterns are strange, and have made considerable efforts to explain in the text and in our responses why the patterns make sense based on what we know about the history of Zeamays from previous research. We agree that currently available methods are not capable of answering all questions we propose adequately. This reflects both limitations with the available data for these populations (i.e. phenotypes and spatially explicit sampling), and limitations in available methods tailored to the questions at hand (spatially explicit inference of the range over which an allele is adaptive). We have made considerable effort to point out the places where our inferences are likely to have low accuracy or limited resolution. These limitations are in many ways inherent to all inferential based science and should not be considered a weak point specific to this work, nor do they take away from the fundamental conclusions, which have changed quantitatively but not qualitatively over the course of peer review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      -The manuscript should say something about the fact that range-wide PSMC does not show a decline.

      We did not use PSMC methods but instead mushi as described in the methods. On line 356 we described how the lower sample size and strong regularization are the most likely explanations for the lack of a population size decline in the rangewide samples.

      - The manuscript should explain how rdmc was run and what "overlapping" means.

      We described how sweep intervals were inferred starting on line 823 (Methods subsection “Identifying Selective Sweeps”). Sweep regions were defined as the outermost coordinates from all populations that shared any overlap in their respectively defined sweep intervals. The details of how we ran rdmc, including all of the parameters, is described starting on line 895 (methods subsection “Inferring modes of convergent adaptation”).

      - Figure 4: "Negative log10" is messed up

      Thank you. This has been fixed for the Version Of Record.

      - Line 318: "accruacy"

      Thank you. We have edited this typo for the Version Of Record.

      - New Table S3: why don't the proportions add to 1?

      These values represent what proportion of fixed differences at 0 fold sites are unique to each population. The denominator is the total number of fixed differences for each population separately, so each proportion is distinct for each population and thus should not sum to one across them. The table caption has been reworded in efforts to clarify for the Version Of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the patterns are interesting, the strength of evidence in support of the conclusions drawn from these patterns is weak overall. Most of the main conclusions are not supported by convincing analyses.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each.

      Weaknesses:

      There were issues with many parts of the paper, especially with the strength of conclusions that can be drawn from the analyses. I list the major issues in the order in which they appear in the paper.

      (1) Gene flow and demography.

      The f4 tests of introgression (Figure 1E) are not independent of one another. So how should we interpret these: as gene flow everywhere, or just one event in an ancestral population? More importantly, almost all the significant points involve one population (Crucero Lagunitas), which suggests that the results do not simply represent gene flow between the sub-species. There was also no signal of increased migration between sympatric pairs of populations. Overall, the evidence for gene flow presented here is not convincing. Can some kind of supporting evidence be presented?

      We agree that the standard approach to f4 tests that we employed here is not without limitations, namely, that the tests are conducted independently, while the true evolutionary history is not. While a joint demographic inference across all populations would be useful, it did not seem tractable to perform over all of our populations with currently available methods, given the number of populations being analyzed, nor does it directly address the question of interest. Our purpose for including the f4 was testing if there was more gene flow between sympatric pairs than in other comparisons (we have made that point more clear in the text near line 174. As described in the text, the distribution of Z scores is generated by pairing focal populations with all other non-focal populations across both subspecies, which means the gene flow signal of interest is marginalized over the effects of gene flow in the other non-focal populations. This is not nearly as rich as inferring the full history, but it gives us some sense of the average amount of gene flow experienced between populations and allows us to address one of our primary questions of interest when conceiving this paper - do sympatric pairs show more geneflow than other pairs? We agree with the reviewer that that answer is largely no, and the writing reflects this.

      Overall, we think both points mentioned by the reviewer here; finding that most but not all tests involved Crucero Lagunitas maize, and that sympatric pairs don’t show higher gene flow; nicely contributes to the overall theme in the paper - the history of both subspecies is idiosyncratic and impacted by humans in ways that do not reflect geographic proximity that we did not anticipate (see expectations near line 110). We have emphasized the connection between f4 tests and the revised rdmc results near line 653.

      The paper also estimates demographic histories (changes in effective population sizes) for each population, and each sub-species together. The text (lines 191-194) says that "all histories estimated a bottleneck that started approximately 10 thousand generations ago" but I do not see this. Figure 2C (not 2E, as cited in the text) shows that teosinte had declines in all populations 10,000 generations ago, but some of these declines were very minimal. Maize has a similar pattern that started more recently, but the overall species history shows no change in effective size at all. There's not a lot of signal in these figures overall.

      I am also curious: how does the demographic model inferred by mushi address inbreeding and homozygosity by descent (lines 197-202)? In other words, why does a change in Ne necessarily affect inbreeding, especially when all effective population sizes are above 10,000?

      All maize populations show a decline beginning 10,000 generations ago. The smallest decline for maize is from 100,000 to 30,000. All teosinte populations show a reduction in population size. The smallest of these drops more than 70% from around 300,000 to 100,000. Three of the teosinte populations showed a reduction in population size from ~10^5 to ~10^3, which is well below 10,000. Thus all populations show declines.

      These large reductions should lead to inbreeding and increased homozygosity by descent. Mushi does not specifically model these features of the data, yet as we show, simulations under the model estimated by Mushi matched the true HBD levels fairly well (Figure 2D).

      The rangewide sample does not show declines, likely because there is enough isolation between populations that the reduction in variation at any given locus is not shared, and is maintained in the populations that did not experience the population decline.

      (2) Proportion of adaptive mutations.

      The paper estimates alpha, the proportion of nonsynonymous substitutions fixed by positive selection, using two different sampling schemes for polymorphism. One uses range-wide polymorphism data and one uses each of the single populations. Because the estimates using these two approaches are similar, the authors conclude that there is little local adaptation. However, this conclusion is not justified.

      There is little information as to how the McDonald-Kreitman test is carried out, but it appears that polymorphism within either teosinte or maize (using either sampling scheme) is compared to fixed differences with an outgroup. These species might be Z. luxurians or Z. diploperennis, as both are mentioned as outgroups. Regardless of which is used, this sampling means that almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte, and on the branch leading to the outgroup. Therefore, it should not be surprising that alpha does not change based on the sampling scheme, as this should barely change the number of fixed differences (no numbers are reported).

      The lack of differences in results has little to do with range-wide vs restricted adaptation, and much more to do with how MK tests are constructed. Should we expect an excess of fixed amino acid differences on very short internal branches of each sub-species tree? It makes sense that there is more variation in alpha in teosinte than maize, as these branches are longer, but they all seem quite short (it is hard to know precisely, as no Fst values or similar are reported).

      The section “Genetic Diversity” in the methods provides details about how luxurians and diploperennis were used as outgroups. The section “Estimating the Rate of Positive Selection, α”, in the methods includes the definition of α and full joint non-linear regression equation and the software used to estimate it (brms), and the relevant citations crediting the authors of the original method. However, some of the relevant information about the SFS construction is provided in the previous section entitled, “Genetic Diversity”. We added reference to this in results near line 800.

      While we appreciate the concern that “almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte”, this is only a problem if there aren’t enough fixed differences that are unshared between populations. This is more of a concern for maize than teosinte, which we make clear as a caveat in the manuscript in several places already. The fact that there is variation in alpha among teosinte populations is evidence that these counts do differ among pops. As we can see in the population trees in Figure 1, there is a considerable amount of terminal branch length for all the populations. Indeed if we look at the number of fixed differences at 0 fold sites across populations:

      The variation in the number of fixed differences, particularly across teosinte means that a large number cannot be shared between populations. We can estimate the fixed differences unique to each subpopulation (and total count) demonstrating that, in general, there are a large number of substitutions unique to each population. This is good evidence the rangewide estimates do not reflect a lack of variation within populations, at least not for teosinte. This is now included in the supplement (Table S3).

      Finally, we note that the branches leading to outgroups are likely not substantially longer than those among populations. Given our estimates of Ne, the coalescent within maize and teosinte should be relatively deep (with Ne of 30K it should be ~120K years). The divergence time between Zea mays and these outgroup taxa has been estimated at ~150K years (Chen et al. 2022). This is now mentioned in the text on line 407.

      We have added a caveat about the reviewers concern for the non-independence of fixed difference for maize near line 386.

      (3) Shared and private sweeps.

      In order to make biological inferences from the number of shared and private sweeps, there are a number of issues that must be addressed.

      One issue is false negatives and false positives. If sweeps occur but are missed, then they will appear to be less shared than they really are. Table S3 reports very high false negative rates across much of the parameter space considered, but is not mentioned in the main text. How can we make strong conclusions about the scale of local adaptation given this? Conversely, while there is information about the false positive rate provided, this information doesn't tell us whether it's higher for population-specific events. It certainly seems likely that it would be. In either case, we should be cautious saying that some sweeps are "locally restricted" if they can be missed more than 85% of the time in a second population or falsely identified more than 25% of the time in a single population.

      The reviewer brings up a worthwhile point. The simulation results indeed call into question how many of the sweeps we claim are exclusive to one population actually are. This caveat is already made, but we now make clearer the reviewer’s concern regarding the high false negative rate (near line 299). However, if anything this suggests sweeps are shared even more often than what is reported. One of the major takeaways from the paper is that convergent adaptation is more common than we expected. The most interesting part about the unique sweeps is the comparison between maize and teosinte. While the true proportions may vary, the relatively higher proportion of sweeps exclusive to one population in teosinte compared to maize is unlikely to be affected by false negatives, since the accuracy to identify sweeps pretty similar across subspecies (though perhaps with some exceptions for the populations with stronger bottlenecks). Further, these criticisms are specific to the raisd results. All sweeps shared across multiple populations were analyzed using rdmc. After adjustments made to the number of proposed sites for selection (see response below), there is good agreement between the raisd and rdmc results - the regions we proposed as selective sweeps with raisd all show evidence convergence using rdmc. Recall too that rdmc uses a quite different approach to inference - all populations are used jointly, labelling those that did and did not experience the sweep. If sweeps were present in populations that were labeled as neutral (or vice versa), this would weaken the power to infer selection at the locus. Much of the parameter space we explored is for quite weak selection, and the simulated analysis shows we are likely to miss those instances, often entirely. For strong sweeps, however, our simulations show we have appreciable accuracy.

      Together, there is reason to be optimistic about our detection of strong shared sweeps and that the main conclusions we make are sound.

      Finally, we note that we are unaware of any other empirical study that has performed similar estimates of the accuracy of the sweep calling in their data (as opposed to using simulations). We thus see these analyses as a significant contribution towards transparency that is completely lacking from most papers.

      A second, opposite, issue is shared ancestral events. Maize populations are much more closely related than teosinte (Figure 2B). Because of this, a single, completed sweep in the ancestor of all populations could much more readily show a signal in multiple descendant populations. This is consistent with the data showing more shared events (and possibly more events overall). There also appear to be some very closely (phylogenetically) related teosinte populations. What if there's selection in their shared ancestor? For instance, Los Guajes and Palmar Chico are the two most closely related populations of teosinte and have the fewest unique sweeps (Figure 4B). How do these kinds of ancestrally shared selective events fit into the framework here?

      The reviewer brings up another interesting point and one that likely impacts some of our results.

      As the reviewer describes, this is an issue that is of more concern to the more closely related populations and is less likely to explain results across the subspecies. We have added this as a caveat (near line 456). As is clear in the writing, sharing across subspecies is our primary interest for the rdmc results.

      These analyses of shared sweeps are followed by an analysis of sweeps shared by sympatric pairs of teosinte and maize. Because there are not more events shared by these pairs than expected, the paper concludes that geography and local environment are not important. But wouldn't it be better to test for shared sweeps according to the geographic proximity of populations of the same sub-species? A comparison of the two sub-species does not directly address the scale of adaptation of one organism to its environment, and therefore it is hard to know what to conclude from this analysis.

      We did not intend to conclude that local adaptation is not important. Especially for teosinte, we report and interpret evidence that many sweeps are happening exclusively to one population, which is consistent with the action of location adaptation and consistent with some of our expectations.

      More directly, this is another instance of us having clear hypotheses going into the paper and constructing specific analyses to test them. As we explain in the paper, we expected the scale of local adaptation to be very small, such that subspecies growing next to each other have more opportunities to exchange alleles that are locally adapted to their shared environment. The analysis we conducted makes sense in light of this expectation. We considered conducting tests regarding geographic proximity, but there is limited power with the number of populations we have within subspecies, and the meaning of the tests is unclear if all populations of both subspecies are naively included together. This analysis shows that, at least for sweeps and fixations, adaptation is larger than a single location. While it may not be a complete description on its own, the work here does provide information about the scale of adaptation and is useful to our overall claims and objectives of the paper. As mentioned in the paper, the story might be very different if we were to study through a lens of polygenic adaptation. We also now include in the discussion in several places mention of where broader sampling could improve inference.

      (4) Convergent adaptation

      My biggest concern involves the apparent main conclusion of the paper about the sources of "convergent adaptations". I believe the authors are misapplying the method of Lee and Coop (2017), and have not seriously considered the confounding factors of this method as applied. I am unconvinced by the conclusions that are made from these analyses.

      The method of Lee and Coop (referred to as rdmc) is intended to be applied to a single locus (or very tightly linked loci) that shows adaptation to the same environmental factor in different populations. From their paper: "Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes." However, in the current paper, we are not considering such a restricted case. Instead, genome-wide scans for sweep regions have been made, without regard to similar selection pressures or to whether events are occurring in the same gene. Instead, the method is applied to large genomic regions not associated with known phenotypes or selective pressures.

      I think the larger worry here is whether we are truly considering the "same gene" in these analyses. The methods applied here attempt to find shared sweep regions, not shared genes (or mutations). Even then, there are no details that I could find as to what constitutes a shared sweep. The only relevant text (lines 802-803) describes how a single region is called: "We merged outlier regions within 50,000 Kb of one another and treated as a single sweep region." (It probably doesn't mean "50,000 kb", which would be 50 million bases.) However, no information is given about how to identify overlap between populations or sub-species, nor how likely it is that the shared target of selection would be included in anything identified as a shared sweep. Is there a way to gauge whether we are truly identifying the same target of selection in two populations?

      The question then is, what does rdmc conclude if we are simply looking at a region that happened to be a sweep in two populations, but was not due to shared selection or similar genes? There is little testing of this application here, especially its accuracy. Testing in Lee and Coop (2017) is all carried out assuming the location of the selected site is known, and even then there is quite a lot of difficulty distinguishing among several of the non-neutral models. This was especially true when standing variation was only polymorphic for a short time, as is estimated here for many cases, and would be confused for migration (see Lee and Coop 2017). Furthermore, the model of Lee and Coop (2017) does not seem to consider a completed ancestral sweep that has signals that persist into current populations (see point 3 above). How would rdmc interpret such a scenario?

      Overall, there simply doesn't seem to be enough testing of this method, nor are many caveats raised in relation to the strange distributions of standing variation times (bimodal) or migration rates (opposite between maize and teosinte). It is not clear what inferences can be made with confidence, and certainly the Discussion (and Abstract) makes conclusions about the spread of beneficial alleles via introgression that seem to outstrip the results.

      We have fixed the “50,000 Kb” typo.

      There are several important points the reviewer makes here worth considering. First and most importantly, the method of Lee and Coop (2017) actually does include sites as part of the composite likelihood calculation. For computational feasibility, the number of positions we initially considered was 20 (20 different positions along the input sequence were proposed as the site of the shared beneficial mutation). In efforts to further address the reviewer’s concern about adaptive mutations at distinct loci, we have increased the number of proposed selected sites to 200. This fact should greatly diminish the reviewer’s concern that we are picking up independent sweeps that happened at different nucleotide positions in the same region - evidence for a beneficial mutation must be shared by the selected populations at a proposed site. As the revisions show, this has modified the results of our paper in a number of ways, including changing all of the previous neutral regions to shared via standing variation or migration. Despite these changes, our previous conclusions are intact, including the pattern that migration rates are high when maize populations share the sweep. Relatedly, we disagree with the reviewer’s characterization of the migration results. The pattern is quite clear and makes sense - when a maize population is involved in the sweep, migration rate is inferred to be high. Sweeps exclusive to teosinte are rarer and are inferred to have a low migration rate. This relates directly to the idea that humans have moved maize relatively rapidly across the landscape.

      We have now included a plot showing how the difference between the maximum composite likelihood (CLE) site compares to the next highest CLE site varies across our inferences (Figure S8), which strongly suggests that patterns are not muddled across multiple loci, but are centered at a focal region where the beneficial allele is inferred to be located. While there are too many to show in the manuscript across all sweeps, here is a nice example of what inference looks like for one of the proposed sweep regions.

      Author response image 1.

      Furthermore, the situation the reviewer is describing would be selection acting on independent mutations (mutations at different loci), which would not create an increase in the amount of allele frequency covariance above and beyond what would be expected by drift under the migration and standing variation models.

      We also note that we are not alone in applying this approach to shared outlier signals in the absence of known genes; indeed the authors of the DMC method have applied it to regions of shared outlier signal themselves (e.g. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008593).

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled multiple populations of maize and teosinte across Mexico, aiming to characterise the geographic scale of local adaptation, patterns of selective sweeps, and modes of convergent evolution between populations and subspecies.

      Strengths & Weaknesses:

      The population genomic methods are standard and appropriate, including Fst, Tajima's D, α, and selective sweep scans. The whole genome sequencing data seems high quality. However, limitations exist regarding limited sampling, potential high false-positive sweep detection rates, and weak evidence for some conclusions, like the role of migration in teosinte adaptation.

      Aims & Conclusions:

      The results are interesting in supporting local adaptation at intermediate geographic scales, widespread convergence between populations, and standing variation/gene flow facilitating adaptation. However, more rigorous assessments of method performance would strengthen confidence. Connecting genetic patterns to phenotypic differences would also help validate associations with local adaptation.

      Impact & Utility:

      This work provides some of the first genomic insights into local adaptation and convergence in maize and teosinte. However, the limited sampling and need for better method validation currently temper the utility and impact. Broader sampling and connecting results to phenotypes would make this a more impactful study and valuable resource. The population genomic data itself provides a helpful resource for the community.

      Additional Context:

      Previous work has found population structure and phenotypic differences consistent with local adaptation in maize and teosinte. However, genomic insights have been lacking. This paper takes initial steps to characterise genomic patterns but is limited by sampling and validation. Additional work building on this foundation could contribute to understanding local adaptation in these agriculturally vital species.

      We appreciate the reviewer’s thoughtful reading of the paper and scrutiny. We hope that the added caveats made in response to reviewer 1 (as well as the previous rounds of peer review) will provide readers with the proper amount of skepticism in the accuracy of some of our initial sweep results, while also demonstrating that many of our conclusions are robust to the concerns raised over the various stages of review.

      We agree with the reviewer that better sampling and the incorporation inference about phenotypic data would be excellent additions, but the information is not available for the studied populations, and is outside scope of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Sometimes alpha is described as a rate, and sometimes as a proportion. The latter is correct.

      We have updated this. Thanks.

      - Line 79: are they really "discrete" populations?

      The teosinte populations sampled are all clearly separated from each other and are physically discrete. The maize population samples came from individual farmer fields. Traditional maize is grown as open-pollinated (outcrossing) populations, and farmers save seed for subsequent generations. An individual farmer’s field thus behaves as a discrete population for our purposes, impacted of course by gene flow, selection, and other evolutionary processes.

      - Lines 418-420: "Large genomes may lead to more soft sweeps, where no single mutation driving adaptive evolution would fix (Mei et al. 2018)." I'm not sure I understand this statement. Why is this a property of genome size?

      Mei et al. 2018 lay out the logic, but essentially they present data arguing that the total number of functionally relevant base pairs increases with genome size (less than linearly). If true, genomes with a large number of potentially functional bp are more likely to undergo soft sweeps (see theory by Hermisson and Pennings cited in Mei et al. 2018).

      - Lines 500-1: selection does not cause one to underestimate effective population sizes. Selection directly affects Ne. I'm not sure what biases the sentences on lines 502-508 are trying to explain.

      We have simplified this section. Not accounting for linked selection (especially positive selection) results in a biased inference of demographic history. See Marsh and Johri (2024) for another example. https://doi.org/10.1093/molbev/msae118

      - Line 511-3: does Uricchio et al. (2019) show any difference in the estimate of alpha from Messer and Petrov (2013) when taking background selection into account?

      What we initially wrote was incorrect. The aMK method of Messer and Petrov (2013) accounts for weakly deleterious polymorphisms, but it does not account for positively selected ones. We have updated this text and suggested our method may underestimate alpha if positively selected segregating alleles are common (near line 539).

      - Lines 598-599: "which would limit the rate of new and beneficial mutations." I don't understand this - shouldn't a bottleneck only affect standing variation? Why would a bottleneck affect new mutations?

      This is simply to say that during the low Ne period of a bottleneck, fewer total mutations (and therefore beneficial mutations) will be generated since there are fewer individuals for mutations to occur in. We have changed “rate” to amount to clarify we do not mean the mutation rate itself.

      Reviewer #2 (Recommendations For The Authors):

      Experiments/Analyses:

      (1) Consider simulating polygenic adaptation in addition to hard and soft sweeps to see if this improves the power to detect adaptive signatures shared between populations. This could involve simulating the coordinated change in allele frequencies across many loci to match a specified shift in trait value due to selection. The ability to detect shared polygenic adaptation between population replicates could be assessed using methods tailored to polygenic signals, such as the Polygenic Selection Score approach. Comparing the power to detect shared polygenic adaptation versus shared hard and soft sweeps would provide further insight into what adaptive modes current methods can uncover. If the power to detect shared polygenic adaptation is very low, the extent of shared adaptation between populations may be even more common than currently inferred. Adding simulations of polygenic adaptation would strengthen the study.

      While this would be a worthwhile undertaking in general, it would be a considerable amount of work outside of the scope and aims of this paper.

      (2) Explore using machine learning approaches like S/HIC to improve power over summary statistic methods potentially.

      We in fact put considerable effort into applying diplo S/HIC before switching to raisd for this project. While predictions on simulations had good power to detect sweeps, we found that applying to our actual data had a dubious number of windows classified as sweeps (e.g. >90% of the genome), which we believed to be false positives. We speculated that this may have to do with sensitivity to demographic or other types of misspecification in the simulations, such as our choice of window sizes compared to local recombination rates. It would likely be fruitful to our further efforts into using machine learning methods for maize and teosinte, but a deeper exploration of the right hyper parameters and simulation choices is likely needed to apply them effectively.

      (3) Increase geographic sampling density, if possible, especially near population pairs showing high differentiation, to better understand the scale of local adaptation.

      We agree this would be valuable research. Hopefully this work inspires further efforts into the question of the spatial and temporal scales of local adaptation with more ambitious spatial sampling designed at the onset

      Writing/Presentation:

      (1) Provide more intuition about the biological interpretation of the migration rates inferred under the migration model of convergence. What do the rates imply about the amount or timing of gene flow?

      We have expanded the discussion sections (starting near line 653) to elaborate on the migration results and connect the rdmc and f4 tests more explicitly. The timing of gene flow is more challenging to address directly with the approaches we used, but we agree it would be interesting to explore more in future papers.

      (2a) Expand the discussion of power limitations and the need for simulation tests. Consider adding ROC curves for sweep detection on simulated data. The relatively low proportion of shared selective sweeps between population replicates highlights limitations in the power to detect sweeps, especially incomplete or soft sweeps. I think it would be a good idea to expand the discussion of the power tradeoffs shown in the simulation analyses. In particular, the ROC curves in Figure S4 clearly show how power declines for weaker selection coefficients across the different sweep types. I suggest making these ROC curves part of the main figures to feature the issue of power limitations more prominently.

      (2b) The discussion would benefit from commenting on how power changes across the sweep simulation scenarios. Adding a summary figure to visualise the effects of sweep type, selection strength, and frequency on detectability could further clarify the power constraints. Stating the proportion of sweeps likely missed strengthens the argument that sharing adaptive alleles is likely even more common than inferred. Discussing power will also motivate the need for developing methods with improved abilities to uncover incomplete and soft sweeps.

      While these are useful suggestions (2a and 2b), the aim of this paper at its core is empirical, and was not intended to give an exhaustive analysis of the power to detect sweeps. We report what parts of the analysis may be impacted by low power and what aspects of our inferences have higher uncertainty due to power. We agree that there is more work to be done to improve methods to detect selection given our findings (see below concerning our efforts to use machine learning as well). While we do not highlight this in the paper, we also note that ours is one of extremely few empirical studies that actually perform power analyses on real data (as opposed to simulations). We think this extra transparency by itself is of substantial utility to the community in demonstrating that the results from simulation studies performed in publications describing a method do not necessarily translate well to empirical data.

      (3) Improve clarity in describing f4 test results. Consider visualising results on a map to show spatial patterns.

      We have expanded the discussion concerning f4 tests (see several comments to reviewer 1). We are not clear on how to effectively visualize f4 spatially, but hope the updates have made the results more clear.

      Minor:

      -  Increase the font size of figure axis labels for improved readability.

      We have looked over and figures and increased font sizes where possible.

      -  Add units to selection coefficient axis labels in Figure 5.

      Selection coefficients are derived in Lee and Coop (2017) from classical population genetics theory. They do not have units, but denote the relative fitness advantage of the heterozygous genotype carrying the beneficial mutation of interest.

      -  Fix the typo 'cophenetic' in Figure S3 caption.

      Fixed. Thank you.

    2. eLife Assessment

      This useful study examines patterns of diversity and divergence in two closely related sub-species of Zea mays, patterns that have bearings on local adaptation in maize and teosinte at intermediate geographic scales. The authors suggest that convergent evolution has been facilitated by both standing variation and gene flow, with independent selective sweeps in the two species. While the data themselves are solid, there are limitations concerning population sampling, false positive rates in sweep detection and integration of phenotypic data, which make it difficult to draw definitive conclusions. The work should in principle be of broad interest to colleagues studying the relationship between domesticated species and their progenitors, as well as those studying instances of parallel evolution.

    3. Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods

    1. eLife Assessment

      This work presents important findings that the human frontal cortex is involved in a flexible, dual role in both maintaining information in short-term memory, and controlling this memory content to guide adaptive behavior and decisions. The evidence supporting the conclusions is compelling, with a well-designed task, best-practice decoding methods, and careful control analyses. The work will be of broad interest to cognitive neuroscience researchers working on working memory and cognitive control.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Shao et al. investigate the contribution of different cortical areas to working memory maintenance and control processes, an important topic involving different ideas about how the human brain represents and uses information when no longer available to sensory systems. In two fMRI experiments, they demonstrate that human frontal cortex (area sPCS) represents stimulus (orientation) information both during typical maintenance, but even more so when a categorical response demand is present. That is, when participants have to apply an added level of decision control to the WM stimulus, sPCS areas encode stimulus information more than conditions without this added demand. These effects are then expanded upon using multi-area neural network models, recapitulating the empirical gradient of memory vs control effects from visual to parietal and frontal cortices. Multiple experiments and analysis frameworks provide support for the authors' conclusions, and control experiments and analysis are provided to help interpret and isolate the frontal cortex effect of interest. While some alternative explanations/theories may explain the roles of frontal cortex in this study and experiments, important additional analyses have been added that help ensure a strong level of support for these results and interpretations.

      Strengths:

      - The authors use an interesting and clever task design across two fMRI experiments that is able to parse out contributions of WM maintenance alone along with categorical, rule-based decisions. Importantly, the second experiment only uses one fixed rule, providing both an internal replication of Experiment 1's effects and extending them to a different situation when rule switching effects are not involved across mini-blocks.

      - The reported analyses using both inverted encoding models (IEM) and decoders (SVM) demonstrate the stimulus reconstruction effects across different methods, which may be sensitive to different aspects of the relationship between patterns of brain activity and the experimental stimuli.

      - Linking the multivariate activity patterns to memory behavior is critical in thinking about the potential differential roles of cortical areas in sub-serving successful working memory. Figure 3's nicely shows a similar interaction to that of Figure 2 in the role of sPCS in the categorization vs. maintenance tasks. This is an important contribution to the field when we consider how a distributed set of interacting cortical areas supports successful working memory behavior.

      - The cross-decoding analysis in Figure 4 is a clever and interesting way to parse out how stimulus and rule/category information may be intertwined, which would have been one of the foremost potential questions or analyses requested by careful readers.

      - Additional ROI analyses in more anterior regions of the PFC help to contextualize the main effects of interest in the sPCS (and no effect in the inferior frontal areas, which are also retinotopic, adds specificity). And, more explanation for how motor areas or preparation are likely not involved strengthens the takeaways of the study (M1 control analysis).

      Weaknesses:

      - An explicit, quantitative link between the RNN and fMRI data is perhaps a last point that would integrate the RNN conclusion and analyses in line with the human imaging data.

      - As Rev 2 mentions, multiple types of information codes may be present, and the response letter Figure 5 using representational similarity (RSA) gets at this question. It would strengthen the work to, at minimum, include this analysis as an extended or supplemental figure.

      To sum up the results, a possible, brief schematic of each cortical area analyzed and its contribution to information coding in WM and successful subsequent behavior may help readers take away important conclusions of the cortical circuitry involved.

    3. Reviewer #2 (Public review):

      Summary:

      The author provide evidence that helps resolve long-standing questions about the differential involvement of frontal and posterior cortex in working memory. They show that whereas early visual cortex shows stronger decoding of memory content in a memorization task vs a more complex categorization task, frontal cortex shows stronger decoding during categorization tasks than memorization tasks. They find that task-optimized RNNs trained to reproduce the memorized orientations show some similarities in neural decoding to people. Together, this paper presents interesting evidence for differential responsibilities of brain areas in working memory.

      Strengths:

      This paper was overall strong. It had a well-designed task, best-practice decoding methods, and careful control analyses. The neural network modeling adds additional insight into the potential computational roles of different regions.

      Weaknesses:

      Few. While more could be perhaps done to understand the RNN-fMRI correspondence, the paper contributes a compelling set of empirical findings and interpretations that can inform future research.

    1. eLife Assessment

      The authors provide solid evidence that the likelihood of looking behaviour is predicted by the expected information gain, hence constituting a valuable formal model and explanation of habituation. Such modelling can represent crucial advances in explanation, over-and-above less specified models that can be fitted post hoc to any empirical pattern, although contrast testing with other accounts are desired. The findings would be of interest to researchers studying cognitive development.

    2. Reviewer #1 (Public review):

      Summary:

      This paper proposes a new model of perceptual habituation and tests it over two experiments with both infants and adults. The model combines a neural network for visual processing with a Bayesian rational model for attention (i.e., looking time) allocation. This Bayesian framework allows the authors to measure elegantly diverse factors that might drive attention, such as expected information gain, current information gain, and surprise. The model is then fitted to infant and adult participants' data over two experiments, which systematically vary the amount of habituation trials (Experiment 1) and the type of dishabituation stimulus (familiarity, pose, number, identity, and animacy). Results show that a model based on (expected) information gain performs better than a model based on surprise. Additionally, while novelty preference is observed when exposure to familiar stimuli is elevated, no familiarity preference is observed when exposure to familiar stimuli is low or intermediate, which is in contrast with past work.

      Strengths:

      There are three key strengths of this work:

      (1) It integrates a neural network model with a Bayesian rational learner, thus bridging the gap between two fields that have often been disconnected. This is rarely seen in the cognitive science field, but the advantages are very clear from this paper: It is possible to have computational models that not only process visual information, but also actively explore the environment based on overarching attentional processes.

      (2) By varying parametrically the amount of stimulus exposure and by testing the effects of multiple novel stimulus types, this work allowed the authors to put classical theories of habituation to the test on much finer scales than previous research has done.

      (3) The Bayesian model allows the authors to test what specific aspects are different in infants and adults, showing that infants display greater values for the noise parameter.

      Weaknesses:

      Although a familiarity preference is not found, it is possible that this is related to the nature of the stimuli and the amount of learning that they offer. While infants here are exposed to the same perceptual stimulus repeatedly, infants can also be familiarised to more complex stimuli or scenarios. Classical statistical learning studies for example expose infants to specific pseudo-words during habituation/familiarisation, and then test their preference for familiar vs novel streams of pseudo-words. The amount of learning progress in these probabilistic learning studies is greater than in perceptual studies, and familiarity preferences may thus be more likely to emerge there. For these reasons, I think it is important to frame this as a model of perceptual habituation. This would also fit well with the neural net that was used, which is processing visual stimuli rather than probabilistic structures. If statements in the discussion are limited to perceptual paradigms, they would make the arguments more compelling.

    3. Reviewer #2 (Public review):

      Summary:

      This paper extends a Bayesian perception/action model of habituation behavior (RANCH) to infant-looking behavior. The authors test the model predictions against data from several groups of infants and adults tested in habituation paradigms that vary the number of familiarisation stimuli and the nature of the test stimuli. Model sampling was taken as a proxy for looking times. The predictions of the model generally resemble the empirical data collected, though there are some potentially important differences.

      Strengths:

      This study addresses an important question, given the fundamental nature of habituation to learning and memory. Previous explanations of infant habituation have typically not been in the form of formal models, making falsification difficult. This Bayesian model is relatively simple but also incorporates a CNN to which the actual stimulus image can be presented, which enables principled predictions about image similarity to be derived.

      The paper contains data from a relatively large number of adults and infants, allowing parameter differences across age to be probed.

      The data suggests that the noise prior parameter is higher in infants, suggesting one mechanism through which infant and adult habituation is different, though of course, this depends on whether there is sufficient empirical evidence that other explanations can be ruled out, which isn't clear in the manuscript currently.

      Weaknesses:

      There are no formal tests of the predictions of RANCH against other leading hypotheses or models of habituation. This makes it difficult to evaluate the degree to which RANCH provides an alternative account that makes distinct predictions from other accounts. I appreciate that because other theoretical descriptions haven't been instantiated in formal models this might be difficult, but some way of formalising them to enable comparison would be useful.

      The justification for using the RMSEA fitting approach could also be stronger - why is this the best way to compare the predictions of the formal model to the empirical data? Are there others? As always, the main issue with formal models is determining the degree to which they just match surface features of empirical data versus providing mechanistic insights, so some discussion of the level of fit necessary for strong inference would be useful.

      The difference in model predictions for identity vs number relative to the empirical data seems important but isn't given sufficient weight in terms of evaluating whether the model is or is not providing a good explanation of infant behavior. What would falsification look like in this context?

      For the novel image similarity analysis, it is difficult to determine whether any differences are due to differences in the way the CNN encodes images vs in the habituation model itself - there are perhaps too many free parameters to pinpoint the nature of any disparities. Would there be another way to test the model without the CNN introducing additional unknowns?

      Related to that, the model contains lots of parts - the CNN, the EIG approach, and the parameters, all of which may or may not match how the infant's brain operates. EIG is systematically compared to two other algorithms, with KL working similarly - does this then imply we can't tell the difference between an explanation based on those two mechanisms? Are there situations in which they would make distinct predictions where they could be pulled apart? Also in this section, there doesn't appear to be any formal testing of the fits, so it is hard to determine whether this is a meaningful difference. However, other parts of the model don't seem to be systematically varied, so it isn't always clear what the precise question addressed in the manuscript is (e.g. is it about the algorithm controlling learning? or just that this model in general when fitted in a certain way resembles the empirical data?)

    1. eLife Assessment

      This manuscript describes the generation of a fused dorsal-ventral organoid system to model interactions between the cortex and striatum to study the onset and progression of Huntington's disease (HD) and other neurodegenerative disorders. While this approach is valuable, further methodological and analytical work is needed to fully support the interpretations and claims of the authors. Incomplete evidence suggests choroid plexus (ChP) abnormalities form a significant component of HD pathogenesis.

    2. Reviewer #1 (Public review):

      In the manuscript "Identification of neurodevelopmental organization of the cell populations of Juvenile Huntington's disease using dorso-ventral HD organoids and HD mouse embryos," the authors establish a fused dorso-ventral system that mimics cortex-striatum interactions within a single organoid and use this system to investigate neurodevelopmental impairments caused by HD. Specifically, they describe certain phenotypes in 60-day HD organoids and the brains of humanized mouse embryos, utilizing both wet-lab and single-cell sequencing techniques. The authors also develop dorsal/ventral and ventral/dorsal mosaic control/HD organoids, showing a capacity to rescue some HD phenotypes.

      The manuscript could be a valuable contribution to the field, however it has relevant drawbacks, the most significant being a lack of clarity regarding the replicates used for each genotype in the sequencing analyses. The lack of information on replicates raises the possibility that only a single replicate was analyzed for each organoid and brain sample. This approach may lead to concerns regarding the reproducibility of the findings, and it may be necessary for the authors to generate additional data to strengthen their conclusions. In addition, the analysis of the HD samples was conducted by pooling distinct cell populations from different brain regions (CTX, HIP, ChP for the dorsal brain, and STR, HYP, TH for the ventral brain). It is unclear why scRNA seq was used on pooled brain regions, which could obscure region-specific insights.

      Another issue pertains to their proposed outcome: "Finally, we found that TTR protein, a choroid plexus marker, is elevated in the adult HD mouse serum, indicating that TTR may be a promising marker for detecting HD". This statement appears to lack statistical support, which makes this set of data potentially misleading and inconclusive.

      The authors are encouraged to provide evidence of biological replicates, remove outcomes that lack statistical support, and address a series of points as detailed elsewhere.

    3. Reviewer #2 (Public review):

      The article titled "Identification of neurodevelopmental organization of the cell populations of juvenile Huntington's disease using dorso-ventral HD organoids and HD mouse embryos" analyses an in vitro human brain organoid model containig dorsal and ventral telencephalum structures derived from human iPSC from Huntington's disease patients or control subjects.

      The authors describe differences in the pattern of expression of genes related to proliferation and neuronal maturation, with a slower pattern of differentiation present in HD cells. Moreover, the authors described a higher differentiation capacity of HD cells to generate choroid plexus identity following dorsal telencephalon prime protocol differentiation when compared to control cells. Whereas the claims related to Choroid plexus identity are intriguing, most of the claims made through the manuscript are not sustained by quantitative data or consistent results in the different conditions analysed, or many experiments seem to be missing to reach final conclusions.

      In addition, the quality of the organoids used for experiments does not seem to have been assessed or satisfactorily presented in the figures of this paper. Many important details related to the experimental execution are missing in the current version of this manuscript.

    1. eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca2+ channel inactivation than pool depletion. Overall, this is a solid study although the results warrant consideration of alternative interpretations.

    2. Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory.

      Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

    3. Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

    4. Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

    5. Author Response:

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of most critical concerns was a possible involvement of Ca2+ channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have already measured total (free plus buffered) calcium increments induced by each of first four APs in a 40 Hz train at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca2+ increments were not different each other, arguing against possible contribution of Ca2+ channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (pv) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of pv in the present study was release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      To other Reviews’ comments, we below described our point-by-point replies.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in rules probability as an alternative.

      Quantal content (m) depends on n * pv, where n = RRP size and pv =vesicular release probability. The value for pv critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRPhyper) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured pv based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, pv in the present study denotes vesicular fusion probability of TS vesicles not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead the occupancy (pocc) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k1 and b1, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and pocc, among which N is a fixed parameter but pocc depends on k1/(k1+b1) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca2+ (Hosoi, et al., 2008), pocc is not a fixed parameter. Therefore, release probability should be re-defined as pocc x pv. In this regard, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in pocc rather than pv, because pv is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in pv of reluctant vesicles.

      If the Reviewer meant vesicular release or fusion probability (pv) by ‘release provability’, pv (of TS vesicles) is not a major player in STF, because the baseline pv is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that pv is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca2+-dependent step increase in pv of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that pv of TS vesicles is close to one, an increase in pv of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as pv,LS) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that pv,LS is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in pv, LS that reside in a distinct pool. Because the increase in pv,LS during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in pv,LS that occurs in parallel with STF. Strong PPD, indicative of high pv, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in pv,LS. One may argue that STF may be mediated by a drastic step increase of pv,LS from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we will incorporate these perspectives into the discussion and further clarify the reasoning behind our conclusions.

      <References>

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into pv and pocc, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= pocc x pv). Our novel finding is that the increase in release probability should be attributed to an increase in pocc, not to that in pv.

      Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline pv is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline pocc seems to be increased by OAG.

      <Reference>

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial pv, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial pv. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial pv has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial pv, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial pv already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial pv. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on pv.

      <References>

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We will add an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the pv in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, pocc in the present study is occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at pv close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high pv, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that pv is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in pocc (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high pv too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low pv or for a gradual increase in pv of reluctant vesicles during short-term facilitation.  

      <Following statement will be added to _Discussion_ in the revised manuscript>

      Previous studies suggested that an increase in pv is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRPhyper’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high pv based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, pv in the present study indicates the fusion probability of TS vesicles. From the same reasons, pocc denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in pocc (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRPhyper.

      Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca2+ channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS,<br /> https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca2+ channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca2+ channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca2+-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca2+-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca2+ to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca2+ (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca2+ channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca2+ channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k1 and b1, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca2+ channel inactivation. Nevertheless, we acknowledge the possibility that Ca2+ channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca2+ transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca2+ channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca2+ channel inactivation is unlikely to contribute to the pronounced PPD.

      Author response image 1 below shows how we measured the total Ca2+ increments at axonal boutons. First we estimated endogenous Ca2+-binding ratio from analyses of single AP-induced Ca2+ transients at different concentrations of Ca2+ indicator dye (panels A to E). And then, using the Ca2+ buffer properties, we converted free [Ca2+] amplitudes to total calcium increments for the first four AP-evoked Ca2+ transients in a 40 Hz train (panels G-I). We will incorporate these results into the revised version of reviewed preprint to provide evidence against the Ca2+ channel inactivation.

      Author response image 1.

      On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca2+-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in pfusion of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (pr) within a single active zone rather than a heterogeneity in pfusion of individual docked vesicles. Therefore both pocc and pv of TS vesicles would contribute to the pr distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high pv contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low pv vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we will incorporate the perspectives into Discussion and further clarify the reasoning behind our conclusions.

      In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca2+ below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the reviewer suggested, low external Ca2+ concentration can lower release probability (pr). Given that both pv and pocc are regulated by [Ca2+]i, low external [Ca2+] may affect not only pv but also pocc, both of which would contribute to low pr. Under such conditions, it would be plausible that the baseline pr becomes much lower than 0.1 due to low pv and pocc (for instance, pv decreases from 1 to 0.5, and pocc from 0.3 to 0.1, then pr = 0.05), and then pr (= pv x pocc) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca2+] accumulates during a train.

      If pv is close to one, pr depends pocc, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca2+ binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca2+-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca2+-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      <Reference>

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k1 / (k1 + b1) and (k1 + b1), respectively. The baseline occupancy depends on k1/b1, while the PPR recovery on absolute values of k1 and b1. Based on pocc and PPR recovery time constant of WT and KD synapses, we expect higher k1/b1 but lower values for (k1 +b1) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of Ms, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      <Reference>

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline pocc and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

    1. eLife Assessment

      This fundamental study reveals that aging in yeast leads to chromosome mis-segregation due to asymmetric partitioning of chromosomes, driven by disruption of the nuclear pore complex and pre-mRNA leakage. The findings are convincingly supported by carefully-designed experimental data with a combination of genetic, molecular biology and cell biology approaches.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      Weaknesses:

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work. In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging.

    3. Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      Weaknesses:

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids. To this point, I note that the intronless version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254).

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1,or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences.

    4. Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      Weaknesses:

      In some cases, controls for experiments were not presented or were depicted in other figures. High variability was seen in chromosome loss data, leading to large error bars. The text could have been more polished.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors explore a novel mechanism linking aging to chromosome mis-segregation and aneuploidy in yeast cells. They reveal that, in old yeast mother cells, chromosome loss occurs through asymmetric partitioning of chromosomes to daughter cells, a process coupled with the inheritance of an old Spindle Pole Body. Remarkably, the authors identify that remodeling of the nuclear pore complex (NPC), specifically the displacement of its nuclear basket, triggers these asymmetric segregation events. This disruption also leads to the leakage of unspliced pre-mRNAs into the cytoplasm, highlighting a breakdown in RNA quality control. Through genetic manipulation, the study demonstrates that removing introns from key chromosome segregation genes is sufficient to prevent chromosome loss in aged cells. Moreover, promoting pre-mRNA leakage in young cells mimics the chromosome mis-segregation observed in old cells, providing further evidence for the critical role of nuclear envelope integrity and RNA processing in aging-related genome instability.

      Strengths:

      The findings presented are not only intriguing but also well-supported by robust experimental data, highlighting a previously unrecognized connection between nuclear envelope integrity, RNA processing, and genome stability in aging cells, deepening our understanding of the molecular basis of chromosome loss in aging.

      We thank the reviewer for this very positive assessment of our work

      Weaknesses:

      Further analysis of yeast aging data from microfluidic experiments will provide important information about the dynamic features and prevalence of the key aging phenotypes, e.g. pre-mRNA leakage and chromosome loss, reported in this work.

      We thank the reviewer for bringing this point, which we will address indeed in the revised version of the manuscript.  In short, chromosome loss is an abrupt, late event in the lifespan of the cells.  Its prevalence is more complex to assess and will require correlated loss rate of several chromosomes concomitantly. The prevalence of the pre-mRNA leakage phenotype is easier to assess and we will provide data about this in the revised manuscript as well.  Our data show that the prevalence is quite high (well above 50%), even if not every cell is affected.

      In addition, a discussion would be needed to clarify the relationship between "chromosome loss" in this study and "genomic missegregation" reported previously in yeast aging.

      The genomic missegregation mentioned by the reviewer is a process distinct from the chromosome loss that we report.  Genomic missegregation is characterized by the entry of both SPBs and all the chromosomes into the daughter cell compartment (PMID: 31714209).  We do observed these events in our movies as well.  In contrast, the chromosome loss phenotype is takes place under proper elongation of the spindle and proper segregation of the two SPBs between mother and bud, as shown in figure 2 of the manuscript.  In our movies, chromosome loss is at least three fold more frequent (for a single chromosome) than full genome missegregation.  Furthermore, whereas chromosome loss is alleviated by the removal of the introns of MCM21, NBL1 and GLC7, genomic missegregation is not.

      Nevertheless, we thank the reviewer for bringing up the possible confusion between the two phenotypes.  We will explain and illustrate the difference between the two processes in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors make the interesting discovery of increased chromosome non-dysjunction in aging yeast mother cells. The phenotype is quite striking and well supported with solid experimental evidence. This is quite significant to a haploid cell (as used here) - loss of an essential chromosome leads to death soon thereafter. The authors then work to tie this phenotype to other age-associated phenotypes that have been previously characterized: accumulation of extrachromosomal rDNA circles that then correlate with compromised nuclear pore export functions, which correlates with "leaky" pores that permit unspliced mRNA messages to be inappropriately exported to the cytoplasm. They then infer that three intron containing mRNAs that encode portions in resolving sister chromatid separation during mitosis, are unspliced in this age-associated defect and thus lead to the non-dysjunction problem.

      Strengths: The discovery of age-associated chromosome non-dysjunction is an interesting discovery, and it is demonstrated in a convincing fashion with "classic" microscopy-based single cell fluorescent chromosome assays that are appropriate and seem robust. The correlation of this phenotype with other age-associated phenotypes - specifically extrachromosomal rDNA circles and nuclear pore dysfunction - is supported by in vivo genetic manipulations that have been well-characterized in the past.

      In addition, the application of the single cell mRNA splicing defect reporter showed very convincingly that general mRNA splicing is compromised in aged cells. Such a pleiotropic event certainly has big implications.

      We thank the reviewer for this assessment of our work.  To avoid confusion, we would like to stress out, however, that our data do not show that splicing per se is defective in old cells.  We only show that unspliced mRNAs tend to leak out of the nucleus of old cells.

      Weaknesses:

      The biggest weakness is "connecting all the dots" of causality and linking the splicing defect to chromosome disjunction. I commend the authors for making a valiant effort in this regard, but there are many caveats to this interpretation. While the "triple intron" removal suppressed the non-dysjunction defect in aged cells, this could simply be a kinetic fix, where a slowdown in the relevant aspects of mitosis, could give the cell time to resolve the syntelic attachment of the chromatids.

      The possibility that intron-removal leads to a kinetic fix is an interesting idea that we will address in the revised manuscript.  So far we have no observed that removing these introns slows down mitosis but we will test the idea by doing precise measurements.

      To this point, I note that the intron-less version of GLC7, which affects the most dramatic suppression of the three genes, is reported by one of the authors to have a slow growth rate (Parenteau et al, 2008 - https://doi.org/10.1091/mbc.e07-12-1254)

      The reviewer is right, removing the intron of GLC7 reduces the expression levels of the gene product (PMID: 16816425) to about 50% of the original value and causes a slow growth phenotype.  However, the cells revert fairly rapidly through duplication of the GLC7 gene.  As a consequence, neither the GLC7-∆i nor the 3x∆i mutant strains show noticeable growth phenotypes by spot assays.  We will document these findings and provide a measurement of the growth rate of the mutant strain in the revised manuscript. 

      In addition, the lifespan curve containing the 3∆i in Figure 5E has a very unusual shape, suggesting a growth problem/"sickness" in this strain.

      To be accurate the strain plotted in Figure 5E is not the 3x∆i triple mutant strain but the 3x∆i mlp1∆  quadruple mutant strain.  The 3x∆i triple mutant strain is plotted in Figure 4D and its shape is similar to that of the wild type cells.  The strain in Figure 5E is indeed sick ,due to the removal of the nuclear basket. However, the 3x∆i mutations partially rescue the replicative lifespan shortening due the mlp1∆ mutation (see text).  Illustrating the fact that the 3x∆i mutant strain is not particularly sick, it shows a prolonged lifespan and a fairly standard aging curve.

      Lastly, the Herculean effort to perform FISH of the introns in the cytoplasm is quite literally at the statistical limit of this assay. The data were not as robust as the other assays employed through this study. The data show either "no" signal for the young cells or a signal of 0, 1,or 2 FISH foci in the aged cells. In a Poisson distribution, which this follows, it is improbable to distinguish between these differences.

      This is correct, this experiment was not the easiest of the manuscript... However, despite the limitations of the assay, the data presented in figure 6B are quite clear.  300 cells aged by MEP were analysed, divided in the cohorts of 100 each, and the distribution of foci (nuclear vs cytoplasmic) in these aged cells were compared to the distribution in three cohorts of young cells.  For all 3 aged cohorts, over 70% of the visible foci were cytoplasmic, while in the young cells, this figure was around 3%.  A t-test was conducted to compare these frequencies between young and old cells (Figure 6B).  The difference is highly significant.  The reviewer refers to the supplementary Figure 4, where we were simply asking i) is the signal lost in cells lacking the intron of GLC7 (the response is unambiguously yes) and ii) what is the general number of dots per cells between young and old wild type cells (without distinguishing between nuclear and cytoplasmic) and the information to be taken from this last quantification is indeed that there is no clearly distinguishable difference between these two population of cells.  In other word, the reason why there are more dots in the cytoplasm of the old cells in the Figure 6B is not because the old cells have much more dots in general.  We hope that these clarifications help understand the data better.  We will make sure that this is clearer in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      Mirkovic et al explore the cause underlying development of aneuploidy during aging. This paper provides a compelling insight into the basis of chromosome missegregation in aged cells, tying this phenomenon to the established Nuclear Pore Complex architecture remodeling that occurs with aging across a large span of diverse organisms. The authors first establish that aged mother cells exhibit aberrant error correction during mitosis. As extrachromosomal rDNA circles (ERCs) are known to increase with age and lead to NPC dysfunction that can result in leakage of unspliced pre-mRNAs, Mirkovic et al search for intron-containing genes in yeast that may be underlying chromosome missegregation, identifying three genes in the aurora B-dependent error correction pathway: MCM21, NBL1, and GLC7. Interestingly, intron-less mutants in these genes suppress chromosome loss in aged cells, with a significant impact observed when all three introns were deleted (3x∆i). The 3x∆i mutant also suppresses the increased chromosome loss resulting from nuclear basket destabilization in a mlp1∆ mutant. The authors then directly test if aged cells do exhibit aberrant mRNA export, using RNA FISH to identify that old cells indeed leak intron-containing pre-mRNA into the cytoplasm, as well as a reporter assay to demonstrate translation of leaked pre-mRNA, and that this is suppressed in cells producing less ERCs. Mutants causing increased pre-mRNA leakage are sufficient to induce chromosome missegregation, which is suppressed by the 3x∆i.

      Strengths:

      The finding that deleting the introns of 3 genes in the Aurora B pathway can suppress age-related chromosome missegregation is highly compelling. Additionally, the rationale behind the various experiments in this paper is well-reasoned and clearly explained.

      We thank the reviewer for their very positive assessment of our work

      Weaknesses:

      In some cases, controls for experiments were not presented or were depicted in other figures.

      We are sorry about this confusion.  We will improve our presentation of the controls, make sure that they are brought back again each time they are relevant (we wanted to limit the cases of replotting the same controls several times).  We will also add those that are missing (such as those mentioned by reviewer 2, see above)

      High variability was seen in chromosome loss data, leading to large error bars.

      We thank the reviewer for this comment. The variance in those two figures (3A and 5D) comes from the suboptimal plotting of this data. This will be corrected in the revised version of the manuscript. 

      The text could have been more polished.

      Thank you for this comment.  We will go through the manuscript again in details

    1. eLife Assessment

      Using highly sophisticated switching linear dynamical systems (SLDS) analyses applied to functional MRI data, this study provides important insights into network dynamics underlying threat processing. After identifying distinct neural network states associated with varying levels of threat proximity, the paper provides compelling evidence of intrinsically and extrinsically driven contributions to these within-state dynamics and between-state transitions. Although the findings could be made more biologically meaningful, this work will be of interest to a wider functional neuroimaging and systems neuroscience community.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript uses state-of-the-art analysis technology to document the spatio-temporal dynamics of brain activity during the processing of threats. The authors offer convincing evidence that complex spatio-temporal aspects of brain dynamics are essential to describe brain operations during threat processing.

      Strengths:

      Rigorous complex analyses well suited to the data.

      Weaknesses:

      Lack of a simple take-home message about discovery of a new brain operation.

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Misra and Pessoa uses switching linear dynamical systems (SLDS) to investigate the neural network dynamics underlying threat processing at varying levels of proximity. Using an existing dataset from a threat-of-shock paradigm in which threat proximity is manipulated in a continuous fashion, the authors first show that they can identify states that each has their own linear dynamical system and are consistently associated with distinct phases of the threat-of-shock task (e.g., "peri-shock", "not near", etc). They then show how activity maps associated with these states are in agreement with existing literature on neural mechanisms of threat processing, and how activity in underlying brain regions alters around state transitions. The central novelty of the paper lies in its analyses of how intrinsic and extrinsic factors contribute to within-state trajectories and between-state transitions. A final set of analyses shows how the findings generalize to another (related) threat paradigm.

      Strengths:

      The analyses for this study are conducted at a very high level of mathematical and theoretical sophistication. The paper is very well written and effectively communicates complex concepts from dynamical systems. I am enthusiastic about this paper, but I think the authors have not yet exploited the full potential of their analyses in making this work meaningful toward increasing our neuroscientific understanding of threat processing, as explained below.

      Weaknesses:

      (1) I appreciate the sophistication of the analyses applied and/or developed by the authors. These methods have many potential use cases for investigating the network dynamics underlying various cognitive and affective processes. However, I am somewhat disappointed by the level of inferences made by the authors based on these analyses at the level of systems neuroscience. As an illustration consider the following citations from the abstract: "The results revealed that threat processing benefits from being viewed in terms of dynamic multivariate patterns whose trajectories are a combination of intrinsic and extrinsic factors that jointly determine how the brain temporally evolves during dynamic threat" and "We propose that viewing threat processing through the lens of dynamical systems offers important avenues to uncover properties of the dynamics of threat that are not unveiled with standard experimental designs and analyses". I can agree to the claim that we may be able to better describe the intrinsic and extrinsic dynamics of threat processing using this method, but what is now the contribution that this makes toward understanding these processes?

      (2) How sure can we be that it is possible to separate extrinsically and intrinsically driven dynamics?

    1. eLife Assessment

      In this innovative study, Carpenet C et al explore the use of nanobody-based PET imaging to track proliferative cells after in vivo transplantation in mice, in a fully immunocompetent setting. The development of a unique set of PET tracers and mouse strains to track genetically-unmodified transplanted cells in vivo is an important novel asset that could potentially facilitate cell tracking. The evidence provided is compelling as the new method proposed might facilitate overcoming certain limitations of alternative approaches, such as full sized immunoglobulins and small molecules, while the specific claims would gain further support by additional experimentation and methodological details.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of nanobody-based PET imaging is important and holds great potential for real-world applications since nanobodies have many advantages over full sized immunoglobulins and small molecules.

      Strengths:

      The submitted manuscript contains quite a bit of interesting data from a collaborative team of well-respected researchers. The authors are to be congratulated for presenting results that may not have turned out the way they had hoped, and doing so in a transparent fashion.

      Weaknesses:

      However, the manuscript could be considered to be a collection of exploratory findings rather than a complete and mature scientific exposition. Most of the sample sizes were 3 per group, which is fine for exploratory work, but insufficient to draw strong statistically robust conclusions for definitive results.

    3. Reviewer #2 (Public review):

      Summary:

      This is a strong and well-described study showing for the first time the use and publicly available resources to use a specific PET tracer to track proliferating transplanted cells in vivo, in a full murine immunecompetent environment.

      In this study the authors described a previously developed set of VHH-based PET tracers to track transplants (cancer cells, embryo's) in a murine immune-competent environment.

      Strengths:

      Unique set of PET tracer and mouse strain to track transplanted cells in vivo without genetic modification of the transplanted cells. This is a unique asset, and a first-in-kind.

      Weaknesses:

      -some methodological aspects and controls are missing

      -no clinical relevance?

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      The topic of nanobody-based PET imaging is important and holds great potential for real-world applications since nanobodies have many advantages over full sized immunoglobulins and small molecules.

      Strengths:

      The submitted manuscript contains quite a bit of interesting data from a collaborative team of well-respected researchers. The authors are to be congratulated for presenting results that may not have turned out the way they had hoped, and doing so in a transparent fashion.

      Weaknesses:

      However, the manuscript could be considered to be a collection of exploratory findings rather than a complete and mature scientific exposition. Most of the sample sizes were 3 per group, which is fine for exploratory work, but insufficient to draw strong statistically robust conclusions for definitive results.

      We thank reviewer #1 for the  review of our work. We appreciate reviewer’s #1 comment on our intent to publish our results in the most transparent fashion, which is the case. We would point out that due to the technical challenges and cost of generating all the different nanobody-radiometal tracer conjugates, we included 3 repeats per group, which is the minimum required  to perform statistical comparisons. We plan to add additional controls to the manuscript that were not initially included to limit the length of the manuscript. These additional controls  will lend more weight to our conclusions.

      Reviewer #2 (Public review):

      Summary:

      This is a strong and well-described study showing for the first time the use and publicly available resources to use a specific PET tracer to track proliferating transplanted cells in vivo, in a full murine immunecompetent environment.

      In this study the authors described a previously developed set of VHH-based PET tracers to track transplants (cancer cells, embryo's) in a murine immune-competent environment.

      Strengths:

      Unique set of PET tracer and mouse strain to track transplanted cells in vivo without genetic modification of the transplanted cells. This is a unique asset, and a first-in-kind.

      Weaknesses:

      - Some methodological aspects and controls are missing

      - No clinical relevance?

      We thank reviewer #2 for their review of our work. We support reviewer’s 2 view on the strength of being able to track transplanted cells in vivo without the need of any sort of manipulation of the transferred cells.  We plan to add additional controls to the manuscript that were not initially included to limit the length of the manuscript. These additional controls will lend more weight to our conclusions. We emphasize that although no clear clinical applications immediately derive  from our studies, this work  still offers better-suited tools for pre-clinical studies that require the ability to track transplanted cells in in vivo . We will resubmit a revised version shortly.

    1. eLife Assessment

      This systematic review presents valuable insights into CCR5 antagonist drugs for neuroprotection and stroke management. The strength of the evidence is convincing, and the review methods and reporting adhere to the expected standards. A sensitivity analysis based on the risk of bias assessment of the included studies would be beneficial, and a more focused/detailed acknowledgment of key limitations of the review would add value to the quality of the reporting and interpretations of the findings.

    2. Reviewer #1 (Public review):

      Summary:

      The paper is well-organized, with clearly defined sections. The systematic review methodology is thorough, with clear eligibility criteria, search strategy, and data collection methods. The risk of bias assessment is also detailed and useful for evaluating the strength of evidence. The involvement of a patient panel is noticeable and positive, ensuring the research addresses real-world concerns and aligning scientific inquiry with patient perspectives. The statistical approach used for analyzing seems appropriate.

      The authors are encouraged to take into account the following points:

      As the authors have acknowledged, there is a high risk of bias across all included studies, particularly in randomization, selective outcome reporting, and incomplete data, which could be highlighted more explicitly in the paper's discussion section, particularly the potential implications for the generalizability of the results. The authors can also suggest mitigation strategies for future studies (e.g., better randomization, blinding, reporting standards, etc.). None of the studies include female animals, and the use of young adult animals (instead of aged models) limits the applicability of the findings to the human stroke population, where stroke incidence is higher in older adults and perhaps the gender issue must be included to reflect the translational aspects. The authors can add to the paper's discussion section that perhaps future preclinical studies should include both sexes and aged animals to align better with the clinical population and improve the translation of findings. Another point is the comorbidity. Comorbidities such as diabetes and hypertension are prevalent in stroke patients. How can these be considered in preclinical designs? The authors should emphasize the importance of future research incorporating such comorbid models to enhance clinical relevance.

      None of the studies had independent replication of their findings, which is a key limitation, especially for a field with high translational expectations. This should be highlighted as a critical next step for validating the efficacy of CCR5 antagonists.

      The studies accessed limited cognitive outcomes (only one reported a cognitive outcome). Given the importance of cognitive recovery post-stroke, this is a gap to highlight in the discussion. Future studies should include more diverse and comprehensive behavioral assessments, including cognitive and emotional domains, to fully evaluate the therapeutic potential.

      The timing of CCR5 administration across studies varies widely (from pre-stroke to several days post-stroke) complicating the interpretation and comparison of results. The authors are encouraged to add that future preclinical studies could focus on narrowing the therapeutic window to more clinically relevant time points.<br /> The paper identifies some alignment with clinical trials, but there are several gaps, too, particularly in the types of behavioral tests used in preclinical studies versus those in clinical trials. If this systematic review and meta-analysis aim to formulate a set of recommendations for future studies, it is important that the authors also propose specific preclinical behavioral tasks that could better align with clinical measures used in trials, like functional assessments related to human stroke outcomes.

      The discussion needs some revisions. It could benefit from an expanded explanation of CCR5's mechanistic role in neuroplasticity and stroke recovery. For instance, linking CCR5 antagonism more closely with molecular pathways related to synaptic repair and remyelination would enhance the quality of the discussion and understanding of the drugs' potential.

      While the tool is used to assess the risk of bias, it might be helpful to integrate a broader framework for evaluating the quality of included studies. This could include sample size justifications, statistical power analysis, or the use of pre-registration in animal studies. These elements can also introduce bias or minimize those if in place.

      Please also highlight confounding factors that might have influenced the results in the included studies, such as variation in stroke models, dosing regimens, or behavioral assessment methods.

      There is some discussion of the meta-analysis' limitations due to the few studies, but this point could be more thoroughly addressed. Please consider including a more critical discussion of the limitations of pooling data from heterogeneous study designs, stroke models, and outcome measures. What can this lead to? Is it reliable to do so, or does it lack scientific rigor? The authors are encouraged to formulate a balanced discussion adding, positive and negative aspects.<br /> The conclusion should more explicitly acknowledge that while CCR5 antagonists show potential, the findings are still preliminary due to the limitations in the preclinical studies (high bias risk, lack of diverse animal models). Overall, the conclusion can end with a call for rigorous, well-controlled, and replicated studies with improved alignment to clinical populations and trials to show that the conclusion remains inconclusive, considering what has been analyzed here.

    3. Reviewer #2 (Public review):

      Summary:

      This is an interesting, timely, and high-quality study on the potential neuroprotective capabilities of C-C chemokine receptor type 5 (CCR5) antagonists in ischemic stroke. The focus is on preclinical investigations.

      Strengths:

      The results are timely and interesting. An outstanding feature is that stroke patient representatives have directly participated in the work. Although this is often called for, it is hardly realized in research practice, so the work goes beyond established standards.

      The included studies were assessed regarding the therapeutic impact and their adherence to current quality assurance guidelines such as STAIR and SRRR, another important feature of this work. While overall results were promising, there were some shortcomings regarding guideline adherence.

      The paper is very well written and concise yet provides much highly useful information. It also has very good illustrations and extremely detailed and transparent supplements.

      Weaknesses:

      Although the paper is of very high quality, a couple of items that may require the authors' attention to increase the impact of this exciting work further. Specifically:

      Major aspects:

      (1) I hope I did not miss that (apologies if I did), but when exactly was the search conducted? Is it possible to screen the recent literature (maybe up to 12/2024) to see whether any additional studies were published?

      (2) Please clearly define the difference between "study" and "experiment," as this is not entirely clear. Is an "experiment" a distinct investigation within a particular publication (=study) that can describe more than one such "experiment"? Thanks for clarifying.

      (3) Is there an opportunity to conduct a correlation analysis between the quality of a study (for instance, after transforming the ROB assessment into a kind of score) and reported effect sizes for particular experiments or studies? This might be highly interesting.

    1. eLife Assessment

      This Research Advance describes a valuable image analysis method to identify individual neurons within a ‎population of fluorescently labeled cells in the nematode C. elegans. The findings are solid and the method succeeds to identify cells with high precision. The method will be be of interest to the C. elegans research community.

    1. eLife Assessment

      This important study presents a finding on the role of the Inferior Colliculus in sensory prediction, cognitive decision-making, and reward prediction. The evidence supporting the claims of the authors is convincing. The work will be of broad interest to sensory neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, what are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      Comments on revised version:

      The authors have adequately addressed all my concerns.

    3. Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors show us two major findings based on their experiments. The first one is climbing effect, which means that neurons' activities continue to increase along time course. The second one is reward effect, which refers to sudden increase of IC neurons' activities when the rewarding is given. Climbing effect is a surprising finding, but reward effect has not been explored clearly here.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as the auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming work with hardship, and this will offer more approximate knowledge of how the human brain works.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors.

      About 'reward effect', it is still unknown if the true nature of reward effect is the simple response to the sound elicited by the electromagnetic valve of rewarding system. The authors claimed the testing space is sound-proofed and believed this is enough to support their opinion. Since the electromagnetic valve was connected to the water tube, and the water tube was attached to a monkey-chair or even in monkey's mouth, the click sound may transmit to the monkey independently on air. There are simple ways to test what happens. One is to add a few trials without reward and see what happens, or to vary the latency between sound sequence and reward.

      Only one of the major findings is convincing, this definitely reduces the credibility of the authors' statements.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work made a lot of efforts to explore the multifaceted roles of the inferior colliculus (IC) in auditory processing, extending beyond traditional sensory encoding. The authors recorded neuronal activitity from the IC at single unit level when monkeys were passively exposed or actively engaged in behavioral task. They concluded that 1)IC neurons showed sustained firing patterns related to sound duration, indicating their roles in temporal perception, 2) IC neuronal firing rates increased as sound sequences progress, reflecting modulation by behavioral context rather than reward anticipation, 3) IC neurons encode reward prediction error and their capability of adjusting responses based on reward predictability, 4) IC neural activity correlates with decision-making. In summary, this study tried to provide a new perspective on IC functions by exploring its roles in sensory prediction and reward processing, which are not traditionally associated with this structure.

      Strengths:

      The major strength of this work is that the authors performed electrophysiological recordings from the IC of behaving monkeys. Compared with the auditory cortex and thalamus, the IC in monkeys has not been adequately explored.

      We appreciate the reviewer’s acknowledgment of the efforts and strengths of our study. Indeed, our goal was to provide a comprehensive exploration of the multifaceted roles of the inferior colliculus (IC) in auditory processing and beyond, particularly in sensory prediction and reward processing. The use of electrophysiological recordings in behaving monkeys was central to our approach, as we sought to uncover the underexplored aspects of IC function in these complex cognitive domains. We are pleased that the reviewer recognizes the value of investigating the IC, a structure that has not been adequately explored in primates compared to other auditory regions like the cortex and thalamus. This feedback reinforces our belief that our work contributes significantly to advancing the understanding of the IC's roles in cognitive processing.

      We look forward to addressing any further points the reviewers may have and refining our manuscript accordingly. Thank you for your constructive feedback and for recognizing the strengths of our research approach.

      Weaknesses:

      (1) The authors cited several papers focusing on dopaminergic inputs in the IC to suggest the involvement of this brain region in cognitive functions. However, all those cited work were done in rodents. Whether monkey's IC shares similar inputs is not clear.

      We appreciate the reviewer's insightful comment on the limitations of extrapolating findings from rodent models to monkeys, particularly concerning dopaminergic inputs to the Inferior Colliculus (IC). While it is true that most studies on dopaminergic inputs to the IC have been conducted in rodents, to our knowledge, no studies have been conducted specifically in primates. To address the reviewer's concern, we have added a statement in both the introduction and discussion sections of our manuscript:

      • Introduction: "However, these studies were conducted in rodents, and the existence and role of dopaminergic inputs in the primate IC remain underexplored." (P.5, Line. 16-17)

      • Discussion: "However, the exact mechanisms and functions of dopamine modulation in the inferior colliculus are still not fully understood, particularly in primates. " (P.21, Line. 7-9)

      (2) The authors confused the two terms, novelty and deviation. According to their behavioral paradigm, deviation rather than novelty should be used in the paper because all the stimuli have been presented to the monkeys during training. Therefore, there is actually no novel stimuli but only deviant stimuli. This reflects that the author has misunderstood the basic concept.

      We appreciate the reviewer's clarification regarding the distinction between "novelty" and "deviation" in the context of our behavioral paradigm. We agree that, given the nature of our experimental design where all stimuli were familiar to the monkeys during training, the term "deviation" more accurately describes the stimuli used in our study rather than "novelty."

      To address this, we have revised the manuscript to replace the term "novelty" with "deviation" wherever applicable. This change has been made to ensure accurate terminology is used throughout the paper, thereby eliminating any potential misunderstanding of the concepts involved in our study.

      We thank the reviewer for pointing out this important distinction, which has improved the clarity and precision of our manuscript.

      (3) Most of the conclusions were made based on correlational analysis or speculation without providing causal evidences.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. Indeed, we acknowledge that the conclusions drawn primarily reflect correlations between neuronal activity and behavioral outcomes, rather than direct causal evidence. This limitation is common in many electrophysiological studies, particularly those conducted in behaving primates, where directly manipulating specific neural circuits to establish causality presents significant challenges, especially in comparison to research in mice.

      This complexity is further compounded when considering the IC’s role as a key lower-level relay station in the auditory pathway. Manipulating IC activity could have a widespread impact on auditory responses in downstream pathways, potentially influencing sensory prediction and decision-making processes.

      Despite this limitation, our study provides novel evidence suggesting that the IC may exhibit multiple facets of cognitive signaling, which could inspire future research aimed at exploring the underlying mechanisms and broader functional implications of these signals.

      To address the reviewer's concerns, we have made the following adjustments to the manuscript:

      (1) Clarified the Scope of Conclusions: We have revised the language in the Results and Discussion sections to explicitly state that our findings represent correlational relationships rather than causal mechanisms. For example, we have referred to the associations observed between IC activity and behavioral outcomes as "correlational" and have refrained from making definitive causal claims without supporting experimental evidence.

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (2) Proposed Future Directions: In the Discussion section, we have included suggestions for future studies to directly test the causality of the observed relationships.

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      We believe these revisions provide a more balanced interpretation of our findings while emphasizing the importance of future research to build on our results and establish causal relationships. Thank you for raising this critical point, which has led to a more rigorous and transparent presentation of our study.

      (4) Results are presented in a very "straightforward" manner with too many detailed descriptions of phenomena but lack of summary and information synthesis. For example, the first section of Results is very long but did not convey clear information.

      We appreciate the reviewer’s feedback regarding the presentation of our results. We understand that the detailed descriptions of phenomena may have made it difficult to discern the key findings and overarching themes in the study. We recognize the importance of balancing detailed reporting with clear summaries and synthesis to effectively communicate our findings.

      To address this concern, we have made the following revisions to the manuscript:

      (1) Condensed and Synthesized Key Findings: We have streamlined the presentation of the Results section by condensing overly detailed descriptions and focusing on the most critical aspects of the data. Key findings are now summarized at the end of each subsection to ensure that the main points are clearly conveyed.

      “The accumulation of the climbing effect alongside repetitive sound presentations suggests a potential linkage to reward prediction or sensory prediction, reflecting an increased probability of receiving a reward and the strengthening of sound prediction as the sound sequence progresses.” (P.10, Line. 17-20)

      “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)

      (2) Improved Flow and Clarity: We have revised the structure and organization of the Results section to improve the flow of information. By rearranging certain paragraphs and refining the language, we aim to present the results in a more cohesive and coherent manner.

      “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)

      “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)

      We believe these changes will make the Results section more accessible and informative, allowing readers to more easily grasp the significance of our findings. Thank you for your valuable suggestion, which has significantly improved the clarity and impact of our manuscript.

      (5) The logic between different sections of Results is not clear.

      We appreciate the reviewer’s observation regarding the lack of clear logical connections between different sections of the Results. We acknowledge that a coherent flow is essential for effectively communicating the progression of findings and their implications.

      To address this concern, we have made the following revisions:

      (1) Enhanced Transitions Between Sections: We have introduced clearer transitional statements between sections of the Results. These transitions explicitly state how each new section builds upon or relates to the previous findings, creating a more cohesive narrative.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (2) Integration of Findings: In several places within the Results, we have added brief synthesis paragraphs that integrate findings across sections. These integrative summaries help to tie together the different aspects of our study, demonstrating how they collectively contribute to our understanding of the Inferior Colliculus’s (IC) role in sensory prediction, decision-making, and reward processing.

      “These results demonstrate that reward anticipation does not drive the climbing effect, thereby reinforcing the idea that sensory prediction is the primary factor influencing the accumulation of the climbing effect in the IC.” (P.12, Line. 4-7)

      “The distinct response in the control condition, where the reward was unpredictable, contrasted sharply with the predictable reward scenario in the deviant condition, underscoring the ability of auditory IC neurons to encode reward prediction errors.” (P.13, Line. 21-22; P.14, Line. 1-2)

      (3) Clarified Rationale: At the beginning of each major section, we have clarified the rationale behind why certain experiments were conducted, connecting them more clearly to the overarching goals of the study. This should help the reader understand the purpose of each set of results in the context of the broader research objectives.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      We believe these changes improve the overall coherence and readability of the Results section, allowing readers to better follow the logical progression of our study. We are grateful for this constructive feedback and believe it has significantly enhanced the manuscript.

      (6) In the Discussion, there is excessive repetition of results, and further comparison with and discussion of potentially related work are very insufficient. For example, Metzger, R.R., et al. (J Neurosc, 2006) have shown similar firing patterns of IC neurons and correlated their findings with reward.

      We appreciate the reviewer's insightful critique regarding the excessive repetition in the Discussion and the lack of sufficient comparison with related work. We acknowledge that a well-balanced Discussion should not only interpret findings but also place them in the context of existing literature to highlight the novelty and significance of the study.

      To address these concerns, we have made the following revisions:

      (1) Reduction of Repetition: We have carefully revised the Discussion to minimize redundant repetition of the Results. Instead of restating the findings, we now focus more on their implications, limitations, and how they advance the current understanding of the Inferior Colliculus (IC) and its broader cognitive roles.

      “We demonstrated that the climbing effect is dynamically modulated (Figure 2D-G), and this modulation is driven primarily by sensory prediction rather than reward anticipation, as controlling for reward effects showed minimal impact on the response profile (Figure 3D, E). This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 1-5)

      (2) Incorporation of Related Work: We have expanded the Discussion to include a more comprehensive comparison with existing literature, specifically highlighting studies that have reported similar findings. For example, we now discuss the work by Metzger et al. (2006), which demonstrated similar firing patterns of IC neurons and correlated these with reward-related processes. This comparison helps contextualize our results and emphasizes the novel contributions our study makes to the field.

      “Metzger and colleagues reported a gradual increase in neural activity—termed late-trial ramping—in the IC during an auditory saccade task. Similar to our results, they observed no climbing effect in the absence of a behavioral task. Both studies support the idea that the climbing effect depends on both behavioral engagement and reward. While both pieces of research emphasize the IC's complex role in integrating auditory processing with cognitive functions related to reward and behavior, our findings provide further insight by distinguishing between the effects of sensory prediction and reward anticipation on IC neuronal activity.” (P.16, Line. 16-24)

      We believe these revisions have significantly improved the quality of the Discussion by reducing unnecessary repetition and providing a more thorough engagement with the relevant literature. We are grateful for the reviewer's valuable feedback, which has helped us refine and strengthen the manuscript.

      Reviewer #2 (Public review):

      Summary:

      The inferior colliculus (IC) has been explored for its possible functions in behavioral tasks and has been suggested to play more important roles rather than simple sensory transmission. The authors revealed the climbing effect of neurons in IC during decision-making tasks, and tried to explore the reward effect in this condition.

      Strengths:

      Complex cognitive behaviors can be regarded as simple ideals of generating output based on information input, which depends on all kinds of input from sensory systems. The auditory system has hierarchic structures no less complex than those areas in charge of complex functions. Meanwhile, IC receives projections from higher areas, such as auditory cortex, which implies IC is involved in complex behaviors. Experiments in behavioral monkeys are always time-consuming works with hardship, and this will offer more approximate knowledge of how the human brain works.

      We greatly appreciate the reviewer's positive summary of our work and recognition of the effort involved in conducting experiments on behaving monkeys. We agree with the reviewer that the inferior colliculus (IC) plays a significant role beyond mere sensory transmission, particularly in integrating sensory inputs with higher cognitive functions. Our study aims to shed light on these complex functions by revealing the climbing effect of IC neurons during decision-making tasks and exploring how reward influences this dynamic.

      We are encouraged that the reviewer acknowledges the importance of investigating the IC's role within the broader framework of complex cognitive behaviors and appreciates the hierarchical nature of the auditory system. The reviewer's comments reinforce the value of our research in contributing to a more nuanced understanding of how the IC might contribute to sensory-cognitive integration.

      We thank the reviewer for highlighting the significance of using behavioral monkey models to approximate human brain function. We are hopeful that our findings will serve as a stepping stone for further research exploring the multifaceted roles of the IC in cognition and behavior.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      These findings are more about correlation but not causality of IC function in behaviors. And I have a few major concerns.

      We appreciate the reviewer’s concern regarding the reliance on correlational analyses in our study. We fully acknowledge the importance of distinguishing between correlation and causality. As outlined in our response to Question 3 from Reviewer #1, we recognize the limitations of relying on correlational data and the inherent challenges in establishing direct causal links, particularly in electrophysiological studies involving behaving primates, and given the lower-level role of the IC in the auditory pathway.

      We have taken steps to clarify this distinction throughout our manuscript. Specifically, we have revised the Results and Discussion sections to ensure that the findings are presented as correlational, not causal, and we have proposed future studies utilizing more direct manipulation techniques to assess causality. We hope these revisions adequately address your concerns.

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      Comparing neurons' spike activities in different tests, a 'climbing effect' was found in the oddball paradigm. The effect is clearly related to training and learning process, but it still requires more exploration to rule out a few explanations. First, repeated white noise bursts with fixed inter-stimulus-interval of 0.6 seconds was presented, so that monkeys might remember the sounds by rhymes, which is some sort of learned auditory response. It is interesting to know monkeys' responses and neurons' activities if the inter-stimuli-interval is variable. Second, the task only asked monkeys to press one button and the reward ratio (the ratio of correct response trials) was around 78% (based on the number from Line 302). so that, in the sessions with reward, monkeys had highly expected reward chances, does this expectation cause the climbing effect?

      We thank the reviewer for raising these insightful points regarding the 'climbing effect' observed in the oddball paradigm and its potential relationship with training, learning processes, and reward expectation. Below, we address each of the reviewer's specific concerns:

      (1) Inter-Stimulus Interval (ISI) and Rhythmic Auditory Response:

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds might lead to a rhythmic auditory response, where monkeys could anticipate the sounds. We appreciate this perspective and recognize its relevance. However, we believe that rhythm is unlikely to be a significant contributor to the 'climbing effect' for two key reasons:

      a) The 'climbing effect' begins as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), before any rhythm or pattern could be fully established, since rhythm generally requires at least three repetitions to form.

      b) In our reward experiment (Figs. 4-5), the sounds were also presented at regular ISIs, which could have facilitated rhythmic learning, yet the observed climbing effect was comparatively small in those conditions.

      Unfortunately, we did not explore variable ISIs in this current study, so we cannot directly address this concern with the available data.

      (2) Reward Expectation and Climbing Effect:

      The reviewer raises a valid concern regarding whether the 'climbing effect' might be influenced by the monkeys' high reward expectation, especially given the high reward ratio (~78%) in the sessions. While it is plausible that reward expectation could contribute to the observed increase in neuronal firing rates, we believe the results from our reward experiment (Fig. 4) suggest otherwise.

      In this experiment, even though reward expectation was likely formed due to the consistent pairing of sounds with rewards (100% reward delivery), we did not observe a significant climbing effect in the auditory response. Additionally, the presence of reward prediction error (Fig. 4D) further supports the idea that while the monkeys may indeed form reward expectations, these expectations do not directly drive the climbing effect in the IC.

      To make this distinction clearer, we have added sentences in the revised manuscript explicitly discussing the relationship between reward expectation and the climbing effect.

      “Within the oddball paradigm, both sensory and reward predictions intensify alongside the recurrence of standard sounds, suggesting that the strength of these predictions could significantly influence neuronal responses. Our experimentation with rewards has effectively dismissed the role of reward prediction (Figures 3 and 4), highlighting the potential significance of sensory prediction in molding the climbing effect.” (P.17, Line. 14-19)

      We believe these revisions provide a clearer understanding of the factors contributing to the climbing effect and effectively address the reviewer's concerns. We sincerely thank the reviewer for these valuable suggestions, which have allowed us to improve the clarity and depth of our manuscript.

      "Reward effect" on IC neurons' responses were shown in Fig. 4. Is this auditory response caused by physical reward action or not? In reward sessions, IC neurons have obvious response related to the onset of water reward. The electromagnetic valve is often used in water-rewarding system and will give out a loud click sound every time when the reward is triggered. IC neurons' responses may be simply caused by the click sound if the electromagnetic valve is used. It is important to find a way to rule out this simple possibility.

      We appreciate the reviewer’s concern regarding the potential confounding factor introduced by the electromagnetic valve’s click sound during water reward delivery, which could be misinterpreted as an auditory response rather than a response to the reward itself. Anticipating this possibility, we took measures to eliminate it by placing the electromagnetic valve outside the soundproof room where the neuronal recordings were performed.

      To address your concern more explicitly, we have added sentences in the Methods section of the revised manuscript detailing this setup, ensuring that readers are aware of the steps we took to eliminate this potential confound. By doing so, we believe that the observed reward-related neural activity in the IC is attributable to the reward processing itself rather than an auditory response to the valve click. We appreciate you bringing this important aspect to our attention, and we hope our clarification strengthens the interpretation of our findings.

      “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate the multifaceted roles of the Inferior Colliculus (IC) in auditory and cognitive processes in monkeys. Through extracellular recordings during a sound duration-based novelty detection task, the authors observed a "climbing effect" in neuronal firing rates, suggesting an enhanced response during sensory prediction. Observations of reward prediction errors within the IC further highlight its complex integration in both auditory and reward processing. Additionally, the study indicated IC neuronal activities could be involved in decision-making processes.

      Strengths:

      This study has the potential to significantly impact the field by challenging the traditional view of the IC as merely an auditory relay station and proposing a more integrative role in cognitive processing. The results provide valuable insights into the complex roles of the IC, particularly in sensory and cognitive integration, and could inspire further research into the cognitive functions of the IC.

      We appreciate the reviewer’s positive summary of our work and recognition of its potential impact on the field. We are pleased that the reviewer acknowledges the significance of our findings in challenging the traditional view of the Inferior Colliculus (IC) as merely an auditory relay station and in proposing its integrative role in cognitive processing.

      Our study indeed aims to provide new insights into the multifaceted roles of the IC, particularly in the context of sensory and cognitive integration. We believe that this research could pave the way for future studies that further explore the cognitive functions of the IC and its involvement in complex behavioral processes.

      We are encouraged by the reviewer’s positive assessment and are committed to continuing to refine our work in response to the constructive feedback provided. We hope that our findings will contribute to advancing the understanding of the IC’s role in the broader context of neuroscience.

      We will now proceed to address the specific concerns and suggestions provided by the reviewer in the following sections.

      Weaknesses:

      Major Comments:

      (1) Structural Clarity and Logic Flow:

      The manuscript investigates three intriguing functions of IC neurons: sensory prediction, reward prediction, and cognitive decision-making, each of which is a compelling topic. However, the logical flow of the manuscript is not clearly presented and needs to be well recognized. For instance, Figure 3 should be merged into Figure 2 to present population responses to the order of sounds, thereby focusing on sensory prediction. Given the current arrangement of results and figures, the title could be more aptly phrased as "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making."

      We appreciate the reviewer’s detailed feedback on the structural clarity and logical flow of the manuscript. We understand the importance of presenting our findings in a clear and cohesive manner, especially when addressing multiple complex topics such as sensory prediction, reward prediction, and cognitive decision-making.

      To address the reviewer's concerns, we have made the following revisions:

      (1) Reorganization of Figures and Results:

      We agree with the suggestion to merge Figure 3 into Figure 2. By doing so, we can present the population responses to the order of sounds more effectively, thereby streamlining the focus on sensory prediction. This will allow readers to more easily follow the progression of the results related to this key function of the IC.

      We have reorganized the Results section to ensure a smoother transition between the different aspects of IC function that we are investigating. The new structure will better guide the reader through the narrative, aligning with the themes of sensory prediction, reward prediction, and cognitive decision-making.

      “Deviant Response dynamics in duration deviation detection” (P.6, Line. 12)

      “Standard Response dynamics in duration deviation detection” (P.9, Line. 4)

      (2) Revised Title:

      In line with the reviewer's suggestion, we have revised the title to "Beyond Auditory Relay: Dissecting the Inferior Colliculus's Role in Sensory Prediction, Reward Prediction, and Cognitive Decision-Making." We believe this title more accurately reflects the scope and focus of our study, as it highlights the three core functions of the IC that we are investigating.

      (3) Improved Logic Flow:

      We have added introductory statements at the beginning of each section within the Results to clarify the rationale behind the experiments and the logical connections between them. This should help to improve the overall flow of the manuscript and make the progression of our findings more intuitive for readers.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      We believe these changes significantly enhance the clarity and logical structure of the manuscript, making it easier for readers to understand the sequence and importance of our findings. Thank you for your valuable suggestion, which has led to a more coherent and focused presentation of our work.

      (2) Clarification of Data Analysis:

      Key information regarding data analysis is dispersed throughout the results section, which can lead to confusion. Providing a more detailed and cohesive explanation of the experimental design would significantly enhance the interpretation of the findings. For instance, including a detailed timeline and reward information for the behavioral paradigms shown in Figures 1C and D would offer crucial context for the study. More importantly, clearly presenting the analysis temporal windows and providing comprehensive statistical analysis details would greatly improve reader comprehension.

      We appreciate the reviewer’s insightful comment regarding the need for clearer and more cohesive explanations of the data analysis and experimental design. We recognize that a well-structured presentation of this information is essential for the reader to fully understand and interpret our findings. To address this, we have made the following revisions:

      (1) Detailed Explanation of Experimental Design:

      We have included a more detailed explanation of the experimental design, particularly for the behavioral paradigms shown in Figures 1C and 1D. This includes a comprehensive timeline of the experiments, along with explicit information about the reward structure and timing. By providing this context upfront, we aim to give readers a clearer understanding of the conditions under which the neuronal recordings were obtained.

      (2) Cohesive Presentation of Data Analysis:

      Key information regarding data analysis, which was previously dispersed throughout the Results section, has been consolidated and moved to a dedicated subsection within the Methods. This subsection now provides a step-by-step description of the analysis process, including the temporal windows used for examining neuronal activity, as well as the specific statistical methods employed.

      We have also ensured that the temporal windows used for different analyses (e.g., onset window, late window, etc.) are clearly defined and consistently referenced throughout the manuscript. This will help readers track the use of these windows across different figures and analyses.

      (3) Enhanced Statistical Analysis Details:

      We have expanded the description of the statistical analyses performed in the study, including the rationale behind the choice of tests, the criteria for significance, and any corrections for multiple comparisons. This relevant information is highlighted in the Results section or figure legends to facilitate understanding.

      We believe these changes will significantly improve the clarity and comprehensibility of the manuscript, allowing readers to better follow the experimental design, data analysis, and the conclusions drawn from our findings. Thank you for this valuable feedback, which has helped us to enhance the rigor and transparency of our presentation.

      (3) Reward Prediction Analysis:

      The conclusion regarding the IC's role in reward prediction is underdeveloped. While the manuscript presents evidence that IC neurons can encode reward prediction, this is only demonstrated with two example neurons in Figure 6. A more comprehensive analysis of the relationship between IC neuronal activity and reward prediction is necessary. Providing population-level data would significantly strengthen the findings concerning the IC's complex functionalities. Additionally, the discussion of reward prediction in lines 437-445, which describes IC neuron responses in control experiments, does not sufficiently demonstrate that IC neurons can encode reward expectations. It would be valuable to include the responses of IC neurons during trials with incorrect key presses or no key presses to better illustrate this point.

      We deeply appreciate the detailed feedback provided regarding the conclusions on the inferior colliculus (IC)'s role in reward prediction within our manuscript. We acknowledge the importance of a robust and comprehensive presentation of our findings, particularly when discussing complex neural functionalities.

      In response to the reviewers' concerns, we have made the following revisions to strengthen our manuscript:

      (1) Inclusion of Population-Level Data for IC Neurons:

      In the revised manuscript, we have included population-level results for IC neurons in a supplementary figure. Initially, we focused on two example neurons that did not exhibit motor-related responses to key presses to isolate reward-related signals. However, most IC neurons exhibit motor responses during key presses (as indicated in Fig.6), which can complicate distinguishing between reward-related activity and motor responses. This complexity is why we initially presented neurons without motor responses. To clarify this point, we have added sentences in the Results section to explain the rationale behind our selection of neurons and to address the potential overlap between motor and reward responses in the IC.

      “This phenomenon was further supported by examining the responses in the duration deviation detection task. Since most IC neurons exhibit motor responses during key presses (Supplementary Figure 6), which can complicate distinguishing between reward-related activity and motor responses, we specifically selected two neurons without motor responses during key presses (Figure 5).” (P.13, Line. 10-15)

      (2) Addition of Data on Key Press Errors and No-Response Trials:

      In response to the reviewer’s suggestion, we have demonstrated Peri-Stimulus Time Histograms (PSTHs) for two example neurons during error trials as below, including incorrect key presses and no-response trials. Given that the monkeys performed the task with high accuracy, the number of error trials is relatively small, especially for the control condition (as shown in the top row of the figure below). While we remain cautious in drawing definitive conclusions from this limited trials, we observed that no clear reward signals were detected during the corresponding window (typically centered around 150 ms after the end of the sound). It is important to note that the experiment was initially designed to explore decision-making signals in the IC, rather than focusing specifically on reward processing. However, the data in Fig. 6 demonstrated intriguing signals of reward prediction error, which is why we believe it is important to present them.

      When combined with the results from our reward experiment (Fig. 5), we believe these findings provide compelling evidence of reward prediction errors being processed by IC neurons.

      Author response image 1.

      (A)  PSTH of the neuron from Figure 5A during a key press trial under control condition. The number in the parentheses in the legend represents the number of trials for control condition. (B) PSTHs of the neuron from Figure 5A during non-key press trials under experimental conditions. The numbers in the parentheses in the legend represent the number of trials for experimental conditions. (C-D) Equivalent PSTHs as in A-B but from the neuron in Figure 5B.

      We are grateful for the reviewer's insightful suggestions, which have allowed us to improve the depth and rigor of our analysis. We believe these revisions significantly enhance our manuscript's conclusions regarding the complex functionalities of IC.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      One of the major issues of this work is that its writing fails to convey the focus and significance of the work. Sentences are too long and multiple pieces of information are often integrated in one sentence, causing great confusion.

      We appreciate the reviewer's feedback regarding the clarity and structure of the manuscript. We agree that scientific writing should be clear and concise to effectively communicate the significance of the work. In response to this comment, we have undertaken the following revisions to improve the readability and focus of the manuscript:

      (1) Simplified Sentence Structure:<br /> We have revisited the manuscript and revised sentences that were overly complex or contained multiple pieces of information. Long sentences have been broken into shorter, more digestible statements to improve clarity and readability. Each sentence now conveys a single, focused idea.

      (2) Improved Flow and Focus:<br /> We have restructured certain paragraphs to ensure that the narrative flows logically and highlights the key findings. This restructuring includes placing the most significant results in prominent positions within paragraphs and ensuring that each section begins with a clear statement of purpose.

      “Building upon the findings from the deviant responses, we next explored whether the climbing effect also manifested in responses to preceding standard stimuli, thereby examining the influence of sensory prediction and repetition on IC neuronal activity.” (P.9, Line. 5-7)

      “To determine whether the observed climbing effect was driven by reward anticipation, we designed an experiment controlling for reward effects, thereby clarifying the underlying factors influencing IC neuronal activity.” (P.10, Line. 22; P.11, Line. 1-2)

      “Recognizing that some IC neurons responded to reward delivery, we investigated whether these responses reflected reward prediction errors, thereby further elucidating the IC's role in reward processing.” (P.12, Line. 9-11)

      “Finally, to determine whether the IC plays a role in decision-making processes related to auditory perception, we analyzed the correlation between neuronal activity and behavioral choices in the duration deviation detection task.” (P.14, Line. 4-6)

      (3) Refined Significance of the Work:<br /> In response to the reviewer's concern that the manuscript fails to clearly convey the significance of the work, we have revised the Introduction and Discussion sections to better emphasize the focus and impact of our findings. We now explicitly highlight the novel contributions of this research to the understanding of the multifaceted role of the IC in sensory prediction, decision-making, and reward processing.

      “In this research, we embarked on a deviation detection task centered around sound duration with trained monkeys, performing extracellular recordings in the IC. Our observations unveiled a 'climbing effect'—a progressive increase in firing rate after sound onset, not attributable to reward but seemingly linked to sensory experience such as sensory prediction. Moreover, we identified signals of reward prediction error and decision-making. These findings propose that the IC's role in auditory processing extends into the realm of complex perceptual and cognitive tasks, challenging previous assumptions about its functionality.” (P.6, Line. 1-8)

      “Overall, our results strongly suggest that the inferior colliculus is actively engaged in sensory experience, reward prediction and decision making, shedding light on its intricate functions in these processes.” (P.16, Line. 10-12)

      We believe these revisions address the reviewer's concern and will make the manuscript more accessible to readers. Thank you for the valuable suggestion, which has led to a more precise and effective presentation of our work.

      Reviewer #2 (Recommendations for the authors):

      (1) In oddball paradigm, inter-stimuli-interval of 0.6 seconds was used. Vary the inter-stimulus-interval should prove whether this effect is rhyme learning. It is better to choose random inter-stimuli-interval and inter-trial-interval for each experiment across whole experiment in case monkeys try to remember the rhythm.

      The reviewer suggests that the fixed inter-stimulus interval (ISI) of 0.6 seconds may lead to a rhythmic auditory response, allowing monkeys to anticipate sounds. This is a valuable suggestion, and we appreciate this perspective. However, we believe that rhythm is unlikely to play a significant role in driving the 'climbing effect.' The 'climbing effect' starts as early as the second sound in the block (as shown in Fig. 2D and Fig. 3B), which is before any rhythm or pattern could be fully established. Typically, rhythm learning requires at least three repetitions to form a predictable sequence.

      Unfortunately, we did not vary the inter-stimuli-interval in the current study, so we cannot directly test this hypothesis with the current dataset. However, we agree with the reviewer that using random ISIs would be an effective way to rule out any potential contribution of rhythm learning to the climbing effect directly.

      (2) Regarding "reward effect" on IC neurons' responses, we should rule out the possibility of simple auditory response to the switching of electromagnetic valve.

      We appreciate the reviewer’s concern about the potential confounding factor of the electromagnetic valve's click sound during water reward delivery, which could be interpreted as an auditory response rather than a true reward-related response. Anticipating this issue, we took measures to eliminate this possibility by placing the electromagnetic valve outside the soundproof room where neuronal recordings were conducted. This setup ensured that any potential auditory noise from the valve was minimized and unlikely to influence the IC neuronal activity.

      To address this concern more explicitly, we have added a description in the Methods section detailing this setup. This revision clarifies the steps we took to rule out this potential confound, strengthening the validity of our claim that the observed IC activity is genuinely related to reward processing and not a simple auditory response to the valve's operation.

      We thank the reviewer for bringing attention to this critical aspect of our experimental design, and we hope this clarification enhances the interpretation of our findings.

      “The reward was controlled electronically by a valve located outside the sound-proof room to prevent any noise interference from the valve.” (P.24, Line. 6-7)

      (3) Since monkeys are smart, simple Go/NoGo design is not a good strategy. The task with more buttons to press, such as 2-AFC or 4-AFC task, may prevent artificial effect of unwanted behaviors and offer us more reliable and useful data.

      We appreciate the reviewer’s suggestion to implement a more complex behavioral task, such as a 2-Alternative Forced Choice (2-AFC) or 4-AFC design, to reduce the possibility of unwanted behaviors and to gather more reliable data. We agree that such paradigms could offer additional insights and help control the monkeys’ decision-making processes by reducing potential confounding factors related to the simplicity of Go/NoGo responses.

      In our current study, we chose the Go/NoGo task because it aligns with our primary experimental goal: investigating the relationship between IC activity and sensory prediction, decision-making, and reward processing in a simplified manner. This task allowed us to focus on reward prediction and sensory responses without introducing additional complexity that could increase the cognitive load on the monkeys and affect their performance. It is worth noting that training monkeys to perform auditory tasks is generally more challenging compared to visual tasks, though they are indeed capable of complex learning.

      Moreover, this novelty detection task was initially designed as an oddball paradigm to explore predictive coding along the auditory pathway. Our lab has concentrated on this topic for several years, with the majority of current research focusing on non-behavioral subjects such as rodents. Implementing a more advanced paradigm like 2-AFC would have increased training time and required a different approach than our core objective.

      That said, we agree that future studies would benefit from using more sophisticated tasks, such as 2-AFC or 4-AFC paradigms, as they could offer a more refined understanding of decision-making processes while enhancing the quality of data by minimizing unwanted behaviors. We believe that incorporating more advanced behavioral paradigms in future work will further enhance the rigor and reliability of our findings.

      (4) Line 52, "challenges...", sounds a little bit too much. The authors tried to sell the ideal that IC is more than simple sensory relay point. I agree with that and I know the experiments on monkeys are not easy to gain too much comprehensive data. But to support authors' further bold opinions, more analysis is need to be done.

      We appreciate the reviewer’s feedback on the tone of the statement in Line 52, where we describe the findings as “challenging” conventional views of the IC as a simple sensory relay point. We agree that while our data provides intriguing insights into the multifunctionality of the IC, especially in sensory prediction, decision-making, and reward processing.

      To address this, we have toned down the language in the revised manuscript to better reflect the current state of our findings. Rather than presenting the results as a direct challenge to existing knowledge, we now describe them as contributing to a growing body of evidence that suggests the IC plays a more integrative role in auditory processing and cognitive functions.

      “This research highlights a more complex role for the IC than traditionally understood, showcasing its integral role in cognitive and sensory processing and emphasizing its importance in integrated brain functions.” (Abstract, P.3, Line.12-15)

      “This modulation by preceding sensory experiences indicates that the IC is more than merely a relay station, suggesting a more intricate role in auditory processing influenced by both ascending and descending neural pathways.” (P.17, Line. 3-5)

      (5) Line 143, "peak response", it is better not to refer this transient response as "peak response". How about "transient response" or "transient peak response"?

      Thank you for your suggestion regarding the terminology used in Line 143. We agree with the reviewer that referring to this as simply a "peak response" could be misleading. To improve clarity and precision, we have revised the term to "transient peak response" as recommended.

      We believe this adjustment better captures the nature of the neuronal activity observed and avoids confusion. The manuscript has been updated accordingly, and we appreciate the reviewer’s valuable input.

      (6) Is it possible to manipulate IC area and check the affection in behavior task?

      We appreciate the reviewer’s suggestion to manipulate the IC area and observe its effect on behavior during the task. Indeed, this would provide valuable causal evidence regarding the role of the IC in sensory prediction, decision-making, and reward processing, which would complement the correlational findings we have presented.

      However, in this particular study, we focused on electrophysiological recordings to observe naturally occurring neuronal activity in behaving monkeys. While it is certainly feasible to manipulate IC activity, such as through pharmacological inactivation, optogenetics, or electrical stimulation, these techniques pose technical challenges in primates. Moreover, manipulating the IC, given its role as a lower-level relay station in the auditory pathway, could potentially disrupt auditory processing more broadly, complicating the interpretation of behavioral outcomes.

      That said, we agree that introducing such manipulations in future studies would significantly enhance our understanding of the causal role of the IC in cognitive and sensory functions. We have now emphasized this as a key future research direction in the revised manuscript’s discussion section. Thank you for this insightful suggestion.

      “Further research is required to explore the underlying neuronal mechanisms and functional significance of this dynamic change comprehensively.” (P.18, Line. 11-12)

      Reviewer #3 (Recommendations for the authors):

      Minor Comments:

      (1) Figure Labeling:

      The figures require more precise labeling, particularly concerning the analysis time windows, to facilitate reader understanding of the results.

      We thank the reviewer for highlighting the importance of precise figure labeling, particularly regarding the analysis time windows. We understand that clear labeling is critical for conveying our findings effectively.

      In response to your suggestion, we have revised the figures to include more precise and detailed labels, especially for the analysis time windows. These changes will help guide readers through the experimental design and clarify the interpretation of the results. We hope these improvements enhance the overall clarity and accessibility of the figures.

      (2) Discrepancies in Figures and Text:

      There are discrepancies in the manuscript that could confuse readers. For example, on line 154, what was referred to as Supplementary Figure 1 seemed to actually be Supplementary Figure 2. Similar issues were noted on lines 480 and 606.

      We appreciate the reviewer bringing this issue to our attention. We apologize for the discrepancies between the figures referenced in the text and their actual labels in the manuscript, as this could indeed confuse readers.

      We have carefully reviewed the entire manuscript and corrected all discrepancies between the figures and their corresponding references in the text, including the issues noted on lines 154, 480, and 606. We have ensured that the figure and supplementary figure references are now consistent and accurate throughout the manuscript.

      (3) Inconsistent Formatting in Figure legends:

      Ensuring a more professional and uniform presentation throughout the manuscript would be appreciated. There was inconsistent use of uppercase and lowercase letters in legends.

      We appreciate the reviewer’s attention to detail regarding the formatting of figure legends. Ensuring a professional and consistent presentation is crucial for enhancing the readability and overall quality of the manuscript.

      We have carefully reviewed all figure legends and made the necessary corrections to ensure consistent use of uppercase and lowercase letters, as well as uniform formatting throughout the manuscript. This includes ensuring that all abbreviations and terminology are used consistently across the text and legends.

    1. eLife Assessment

      This valuable study describes how trains of mossy fiber stimulation control cerebellar unipolar brush cell discharges. The dissection of the contributions of relevant glutamate receptors to these transformations is convincing. Overall, the study broadens our understanding of temporal processing in the cerebellar cortex.

    2. Reviewer #1 (Public review):

      In this manuscript, the authors recorded cerebellar unipolar brush cells (UBCs) in acute brain slices. They confirmed that mossy fiber (MF) inputs generate a continuum of UBC responses. Using systematic and physiological trains of MF electrical stimulation, they demonstrated that MF inputs either increased or decreased UBC firing rates (UBC ON vs. OFF) or induced complex, long-lasting modulation of their discharges. The MF influence on UBC firing was directly associated with a specific combination of metabotropic glutamate receptors, mGluR2/3 (inhibitory) and mGluR1 (excitatory). Ultimately, the amount and ratio of these two receptors controlled the time course of the effect, yielding specific temporal transformations such as phase shifts. The experiments are well-executed and properly analyzed.

      Strengths:

      (1) A wide range of MF stimulation based on activity patterns observed in vivo was explored, including burst duration and frequency dependency, which could serve as a valuable foundation for explicit modeling of temporal transformations in the granule cell layer.<br /> (2) The pharmacological blockade of mGluR2/3, mGluR1, AMPA, and NMDA receptors helped identify the specific roles of these glutamate receptors.<br /> (3) The experiments convincingly demonstrate the key role of mGluR1 receptors in temporal information processing by UBCs.

      Weaknesses:

      (1) This study is a follow up of previous work (Guo et al., Nat. Commun., 2021).<br /> (2) The MF activity used to mimic natural stimulation was previously collected from primates, whereas the recordings were conducted in mice.

      Comments on revisions:

      The authors included a discussion about inhibition, but I still disagree with their claim that it was not possible to study the MF-UBC connection with inhibition unblocked. This group has already conducted experiments on Golgi cell inhibition in slices.

    3. Reviewer #2 (Public review):

      This study addresses the question of how UBCs transform synaptic input patterns into spiking output patterns and how different glutamate receptors contribute to their transformations. The first figure utilizes recorded patterns of mossy fiber firing during eye movements in the flocculus of rhesus monkeys obtained from another laboratory. In the first figure, these patterns are used to stimulate mossy fibers in the mouse cerebellum during extracellular recordings of UBCs in acute mouse brain slices. The remaining experiments stimulate mossy fiber inputs at different rates or burst durations, which is described as 'mossy-fiber like', although they are quite simpler than those recorded in vivo. As expected from previous work, AMPA mediates the fast responses, and mGluR1 and mGluR2/3 mediate the majority of longer-duration and delayed responses. The manuscript is well organized and the discussion contextualizes the results effectively.

      Comments on revisions:

      The authors have adequately addressed my concerns.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors recorded cerebellar unipolar brush cells (UBCs) in acute brain slices. They confirmed that mossy fiber (MF) inputs generate a continuum of UBC responses. Using systematic and physiological trains of MF electrical stimulation, they demonstrated that MF inputs either increased or decreased UBC firing rates (UBC ON vs. OFF) or induced complex, long-lasting modulation of their discharges. The MF influence on UBC firing was directly associated with a specific combination of metabotropic glutamate receptors, mGluR2/3 (inhibitory) and mGluR1 (excitatory). Ultimately, the amount and ratio of these two receptors controlled the time course of the effect, yielding specific temporal transformations such as phase shifts.

      Overall, the topic is compelling, as it broadens our understanding of temporal processing in the cerebellar cortex. The experiments are well-executed and properly analyzed.

      Strengths:

      (1) A wide range of MF stimulation patterns was explored, including burst duration and frequency dependency, which could serve as a valuable foundation for explicit modeling of temporal transformations in the granule cell layer.

      (2) The pharmacological blockade of mGluR2/3, mGluR1, AMPA, and NMDA receptors helped identify the specific roles of these glutamate receptors.

      (3) The experiments convincingly demonstrate the key role of mGluR1 receptors in temporal information processing by UBCs.

      Weaknesses:

      (1) This study is largely descriptive and represents only a modest incremental advance from the previous work (Guo et al., Nat. Commun., 2021). 

      We feel that the present study is a major advance.  It builds on (Guo et al., Nat. Commun., 2021) in which we examined the effects of bursts of 20 stimuli at 100 spk/s.  In that study we found that differential expression of mGluR1 and mGluR2 let to a continuum of temporal responses in UBCs, but AMPARs make a minimal contribution for such bursts. It was not known how UBCs transform realistic mossy fiber input patterns. Here we provide a comprehensive evaluation of a wide range of input patterns that include a range of bursts comprised of 1-20 stimuli, sustained stimulation with stimulation of 1 spk/s to 60 spk/s. This more thorough assessment of UBC transformations combined with a pharmacological assessment of the contributions of different glutamate receptor subtypes provided many new insights: 

      • We found that UBC transformations are comprised of two different components: a slow temporally filtered component controlled by an interplay of mGluR1 and mGluR2, and a second component mediated by AMPARs that can convey spike timing information. NMDARs do not make a major contribution to UBC firing. The finding that UBCs simultaneously convey two types of signals, a slow filtered response and responses to single stimuli, has important implications for the computational potential of UBCs and fundamentally changes the way we think about UBCs.  

      • We found that with regard to the slow filtered component mediated by mGluR1 and mGluR2, we could extend the concept of a continuum of responses evoked by 20 stimuli at 100 spk/s (Guo et al., Nat. Commun., 2021) to a wide range of stimuli. It was not a given that this would be the case.   

      • The contributions of AMPARs was surprising. Even though snRNAseq data did not reveal a gradient of AMPAR expression across the population of UBCs (Guo et al., Nat. Commun., 2021), we found that there was a gradient of AMPA-mediated responses, and that the AMPA component was also most prominent in cells with a large mGluR1 component. Our finding that AMPAR accessory proteins exhibit a gradient across the population, which could account for the gradient of AMPAR responses, will prompt additional studies to test their involvement. 

      (2) The MF activity used to mimic natural stimulation was previously collected in primates, while the recordings were conducted in mice.

      Our first task was to determine the firing properties of mossy fibers under physiological conditions in UBC rich cerebellar regions. Previous studies have estimated this in anesthetized mice using whole cell granule cell recordings (Arenz et al., 2008; Witter & De Zeeuw 2015). However, for assessing firing patterns during awake behavior, we felt that the most comprehensive data set available in a UBC rich cerebellar region was for mossy fibers involved in smooth pursuit in monkeys (David J. Herzfeld and Stephen G. Lisberger). This revealed the general features of mossy fiber firing that helped us design stimulus patterns to thoroughly probe the properties of MF to UBC transformations. The firing patterns are designed to investigate the transformations for a wide range of activity patterns and have important general implications for UBC transformations that are likely applicable to UBCs in different species that are activated in different ways.   

      (3) Inhibition was blocked throughout the study, reducing its physiological relevance.

      The reviewer correctly brings up the very important issue of inhibition in shaping UBC responses.  It is well established that UBCs are inhibited by Golgi cells (Rousseau et al., 2012), and we recently showed that some UBCs are also inhibited by PCs (Guo et al., eLife, 2021). This will undoubtedly influence the firing of UBCs in vivo. We considered examining this issue, but felt that brain slice experiments are not well suited to this. In contrast to MF inputs that can be activated with a realistic activity pattern, it is exceedingly difficult to know how Golgi cells and Purkinje cells are activated under physiological conditions. Each UBC is activated by a single mossy fiber, but inhibition is provided by Golgi cells that are activated by many mossy fibers and granule cells, and PCs that are controlled by many granule cells and many other PCs. In addition, we found that many Golgi cells do not survive very well in slices, and the axons of many PCs are severed in brain slice. Although limitations of the slice preparation prevent us from determining the role of inhibition in shaping UBC responses, we have added a section to the discussion in which we address the important issue of inhibition and UBC responses.   

      Reviewer #2 (Public review):

      This study addresses the question of how UBCs transform synaptic input patterns into spiking output patterns and how different glutamate receptors contribute to their transformations. The first figure utilizes recorded patterns of mossy fiber firing during eye movements in the flocculus of rhesus monkeys obtained from another laboratory. In the first figure, these patterns are used to stimulate mossy fibers in the mouse cerebellum during extracellular recordings of UBCs in acute mouse brain slices. The remaining experiments stimulate mossy fiber inputs at different rates or burst durations, which is described as 'mossy-fiber like', although they are quite simpler than those recorded in vivo. As expected from previous work, AMPA mediates the fast responses, and mGluR1 and mGluR2/3 mediate the majority of longer-duration and delayed responses. The manuscript is well organized and the discussion contextualizes the results effectively.

      The authors use extracellular recordings because the washout of intracellular molecules necessary for metabotropic signaling may occur during whole-cell recordings. These cell-attached recordings do not allow one to confirm that electrical stimulation produces a postsynaptic current on every stimulus. Moreover, it is not clear that the synaptic input is monosynaptic, as UBCs synapse on one another. This leaves open the possibility that delays in firing could be due to disynaptic stimulation. Additionally, the result that AMPAmediated responses were surprisingly small in many UBCs, despite apparent mRNA expression, suggests the possibility that spillover from other nearby synapses activated the higher affinity extrasynaptic mGluRs and that that main mossy fiber input to the UBC was not being stimulated. For these reasons, some whole-cell recordings (or perforated patch) would show that when stimulation is confirmed to be monosynaptic and reliable it can produce the same range of spiking responses seen extracellularly and that AMPA receptormediated currents are indeed small or absent in some UBCs.

      We appreciate the reviewer’s concerns regarding the reliability of mossy fiber activation, the possibility of glutamate spillover from other synapses, and the possibility of disynaptic activation involving stimulation of MFàUBCàUBC connections. We examined these issues in a previous study (Guo et al., Nat. Commun., 2021).  We did on-cell recordings and followed that up with whole cell voltage clamp recordings from the same cell (Guo et al., Nat. Commun., 2021, Fig. 5), and there was good agreement with the amplitude and timing of spiking and the time course and amplitudes of the synaptic currents.  We also compared responses evoked by focal glutamate uncaging over the brush and MF stimulation (Guo et al., Nat. Commun., 2021, Fig. 4). We found that the time courses and amplitudes of the responses were remarkably similar. This strongly suggests that the responses we observe do not reflect disynaptic activation (MFàUBCàUBC connections). We also showed that the responses were all-or-none: at low intensities no response was evoked, as the intensity of extracellular stimulation was increased a large response was suddenly evoked at a threshold intensity and further increases in intensity did not increase the amplitude of the response (Guo et al., Nat. Commun., 2021, Extended data Fig. 1).  We can be well above threshold and still excite the same response, and as a result we do not see stereotyped indications of an inability to stimulate during prolonged high frequency activation.  We recognize the importance of these issues, so we have  added a section dealing explicitly with these issues (pp. 15-16).  

      A discussion of whether the tested glutamate receptors affected the spontaneous firing rates of these cells would be informative as standing currents have been reported in UBCs. It is unclear whether the firing rate was normalized for each stimulation, each drug application, or each cell. It would also be informative to report whether UBCs characterized as responding with Fast, Mid-range, Slow, and OFF responses have different spontaneous firing rates or spontaneous firing patterns (regular vs irregular).

      The spontaneous firing of UBCs is indeed an interesting issue that is deserving of further investigation. It is not currently known how spontaneous firing at rest is regulated in UBCs, however, in previous work we have shown that there is great diversity in the rates across the population of UBCs in the dorsal cochlear nucleus (Huson & Regehr, JNeurosci, 2023, Fig. 4). Unfortunately, during the kind of sustained high-frequency stimulation protocols (as used in this study) spontaneous firing rates tend to increase. This is likely an effect of residual receptor activation. As such, our current dataset is not suitable to performing in depth analysis of the effects of the different glutamate receptors on spontaneous firing rates. As this study aims to explore UBC responses to MF inputs we feel that specific experiments to address the issue of spontaneous firing rates are outside of the scope.

      As the reviewers points out there are indeed different ways the firing rates can be normalized for display in the heatmaps, and different normalizations have been used in different figures. We have made sure that the method for normalization is clearly indicated in the figure legends for each of the heatmaps on display, specifying the protocol and drug application used for normalization.

      Figure 1 shows examples of how Fast, Mid-range, Slow, and OFF UBCs respond to in vivo MF firing patterns, but lacks a summary of how the input is transformed across a population of UBCs. In panel d, it looks as if the phase of firing becomes more delayed across the examples from Fast to OFF UBCs. Quantifying this input/output relationship more thoroughly would strengthen these results.

      The UBC responses to in vivo MF firing patterns are intriguing and we agree that there appears to be increasing delays for slower UBCs visible in Figure 1. However, we feel that the true in vivo MF firing patterns are too complex and irregular for rigorous interpretation. Therefore, we only tested simplified burst and smooth pursuit-like input patterns on the full population of UBCs. Here we indeed do see increasingly delayed responses as UBCs get slower (Fig. 4).

      Inhibition was pharmacologically blocked in these studies. Golgi cells and other inhibitory interneurons likely contribute to how UBCs transform input signals. Speculation of how GABAergic and glycinergic synaptic inhibition may contribute additional context to help readers understand how a circuit with intact inhibition may behave. 

      As indicated in our response to reviewer 1, we have added a section discussing the very important issue of inhibition and UBC responses in vivo.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Including recordings without inhibition blocked would strengthen the study and provide a more comprehensive view of the transformations made by UBCs at the input stage of the cerebellar cortex.

      See response to public comments.   

      (2) The authors claim that a continuum of temporal responses was observed in UBCs, but they also distinguish between fast, mid-range, slow, and OFF UBCs. While some UBCs fire spontaneously, others are activated by MF inputs. A more thorough classification effort would clarify the various response profiles observed under specific MF stimulation regimes. Have the authors considered using machine learning algorithms to aid in classification? 

      We fundamentally feel that these response properties do not conform to rigid categories. In our previous work we have shown that UBC population constitutes a continuum in terms of gene expression, and in terms of spontaneous and evoked firing patterns. While in order to answer some questions empirically it may still be useful to apply advanced algorithms to enforce separate groups to be compared, in this work we aimed to present the full range of UBC responses without introducing any additional biases that such methods would produce.

      (3) A robust classification could assist in quantifying the temporal shifts observed during smooth pursuit-like MF stimulation, a critical outcome of the study.

      As stated above, we prefer to present an unbiased overview of the continuous nature of the UBC population, as we believe that this is fundamentally the most accurate representation. While it is true that this prevents us from providing a quantification in the different temporal shifts, we believe that the range of shifts across the population is sufficiently large and continuously varying the be convincing (see Figure 4d).  

      (4) In Figure 5, contrary to what is described on page 10, Cells 10 and 11 (OFF UBCs) appear to behave differently, as mGluR1 does not seem to affect their firing rates. A specific case should be made for OFF UBCs. 

      Indeed, cells 10 and 11 do not show clear increases in firing and are not strongly affected by blocking of mGluR1. However, as discussed above and explored in our previous work, we feel that the range of UBC increases in firing is best described as a continuum, including the extreme where increases in firing are no longer clearly observable. As the aim in this work is to describe this continuum of responses for physiologically relevant inputs, we do not feel there is a benefit to creating a specific case for OFF UBCs here. It should be pointed out that the number of “pure” OFF UBCs completely lacking an mGluR1 component is very small.  

      (5) A summary diagram should be added at the end of the manuscript to highlight the key temporal features observed in this study. 

      This is a great suggestion and we have prepared such a summary diagram (Figure 6).

      Reviewer #2 (Recommendations for the authors):

      (1) Page 3- "Assed" should be "assessed"

      (2) Page 19- "by integrating" is repeated twice

      (3) It was not noted whether the data would be made available. It could be useful for those interested in implementing UBCs in models of the cerebellar cortex.

      We agree that this data set is invaluable to those interested in implementing UBCs in models of the cerebellar cortex.  We will make the dataset available as described in the text.

    1. eLife Assessment

      This important work advances our understanding of the contribution of tissue-resident immune cells to trained immunity phenotypes. The evidence supporting the claims is convincing, with results that will be of interest to immunologists and scientists studying the host-pathogen interface.

    2. Reviewer #1 (Public review):

      Summary:

      In the submitted manuscript, Solomon et al carefully detail shifts in tissue-specific myeloid populations associated with trained immunity using intraperitoneal BCG injection as a model for induction. They define the kinetics of shifts in myeloid populations within the spleen and the transcriptional response associated with IP BCG exposure. In lineage tracing experiments, they demonstrate that tissue-resident macrophages, red-pulp macrophages (RPM) that are rapidly depleted after BCG exposure, are replenished from recruited monocytes and expansion of tissue-resident cells; they use transcriptional profiling to characterize those cells. In contrast to previous descriptions of BCG-driven immune training, they do not find BCG in the bone marrow in their model, suggesting that there is not direct training of myeloid precursor populations in the bone marrow. They then link the observed trained immunity phenotype (restriction of heterologous infection with ST) with early activation of STAT1 through IFN-γ.

      Strengths:

      The work includes careful detaining of shifts and origins of myeloid populations within tissue associated with trained immunity and is a meaningful advance for the field.

      Caveats:<br /> Given that the authors demonstrate that BCG persists in the spleen, it is possible that some level of BCG persistence in the spleen is a necessary contributor (together with signaling through STAT1) to the observed tissue-specific T1 phenotype.

      Whether ongoing signaling through the axes are required for ongoing protection is not specifically addressed in this work. There is recent work by other groups that partially addresses these caveats, and it would be helpful context to reference those papers.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Solomon and colleagues demonstrate that trained immunity induced by BCG can reprogram myeloid cells within localised tissue, which can sustain prolonged protective effects. The authors further demonstrate an activation of STAT1-dependent pathways.

      Strengths:

      The main strength of this paper is the in-depth investigation of cell populations affected by BCG training, and how their transcriptome changes at different time points post-training. Through use of flow cytometry and sequencing methods, the authors identify a new cell population derived from classical monocytes. They also show that long-term trained immunity protection in the spleen is dependent on resident cells. Through sequencing, drug and recombinant inhibition of IFNg pathways, the authors reveal STAT1-dependent responses are required for changes in the myeloid population upon training, and recruitment of trained monocytes.

      Weaknesses:

      A significant amount of work has already been performed for this study. No significant weaknesses were found.

      Comments on revisions:

      I thank the authors for carefully considering all reviewer comments. I have no further recommendations for the authors.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) “…Given that the focus in the paper is on tissue-specific immune training, it would be helpful to know whether the ongoing presence of BCG at low levels in the profiled tissue contributes to the trained immunity phenotypes observed.”….“To address point 1, the authors could treat with anti-BCG antibiotics at 2 or 4 weeks post-BCG exposure and profile the impact on trained immunity phenotypes.”

      We thank the reviewer for this important comment. The experiment suggested by the reviewer is to treat with abx to remove BCG from the tissue from the first week post challenge for the duration of four weeks. In previous work, Kaufmann et al (PMID: 29328912) showed that after a month of antibiotics, BCG levels are reduced, but residual BCG levels still remains. Accroding to their results, while antibiotic treatment reduces the training phenotype of LKS<sup>+</sup> HSC expansion in the bone marrow, protection against TB was maintained during ex-vivo challenge of BMDMs.

      In our experiments, we are concerned that antibiotic treatment will only change the dynamics of BCG clearance, but residual BCG will remain and will limit our interpretation. Furthermore, examining the transcriptional changes we observed at early timeponts after BCG may not be relavant at 1 month post antibiotics.

      As an alternative approach, we refer to our results with an antibody to block early IFNg signaling (1-5 days; Figure S4 K-M). Here, although BCG levels are comparable between treatment and control groups, we were unable to detect any TI-related transcriptional signatures upon early aIFNg treatment. This indicates that that residual BCG is not sufficient for the TI phenotype in the spleen. We now emphasize this point in the revised version of the manuscript (see lines 335-339).

      (2) “Related to the point about BCG above, it would be helpful to understand whether this is a specifically time-limited requirement when trained immunity is first induced, or whether ongoing signaling through this axis is required for maintenance of the observed trained immunity phenotypes.”… “To address point 2, authors could treat with the inhibitor at 2 weeks and/or 4 weeks post-BCG and profiling later transcriptional and/or salmonella growth phenotypes.”

      We thank the reviewer for his comment, but respectfully claim that this experiment might not be feasible. As IFNg signaling is directly required for control of Salmonella infection,  we are concerned that late IFNg inhibition will also directly affect the response to Salmonella challenge and control. Thus, in our experiments, to ensure that treatment only affects the response to BCG challenge, we were careful to limit aIFNg treatment to the early time points and allowed long resting period before Salmonella challenge.

      Furthermore, inhibition of IFNg at late time point was already tested in both Lee et al, and Tran et al. (PMID: 38036767, 38302603). The authors show that late blockage of IFNg signalling (days 14-21) is sufficient to prevent protection during a viral challenge. This would indeed imply that ongoing signalling is necessary in this context to generate protection, specifically also late signalling events. Furthermore, Lee at al., also observed a biphasic activation pattern of cytokines and recruited cells, suggesting that rather than continuous activation, sequential cell activation and signalling may be occurring.

      Respectfully, in our experiments we focus on the early time points based on our observations of early recruitment of CM-T cells (Figure S2. C-D). This was our main findings of this paper. We agree with the reviewer that future experiments are required to compare the differences in cell populations that are invovled in the early vs. late trained phenotpe dynamics.

      Minor points:

      Experimental conditions for the shown data are not consistently clear from the figure legends- would add more detail about the biological conditions.

      OK – done

      Figure 3E missing units on the legend

      OK – done

      Figure 4C middle panel missing y-axis label

      OK – done

      Line 40- remove "both"

      OK- done

      Line 156- Language could be clearer about what was described previously in contrast to the results shown in this work

      We have modified the text accordingly in the revised manuscript

      Reviewer #2:

      “A significant amount of work has already been performed for this study. The work is rich with data and description.”

      We thank the reviewer for acknowledging the importance of our work.

      Minor comments for the authors to consider:

      “BCG is widely recognised to induce trained immunity. In this study, Salmonella is used as secondary infection event. Why? What is role of Salmonella in this study? Does this study contribute to our understanding of the Salmonella infection process? What does this tell us about Salmonella/vaccines? Is there any evidence that BCG protects against Salmonella infection? “

      We thank the reviewer for this important comment. We now added to the introduction and the discussion the relevance of our study to the potential of BCG and trained immunity as an alternative heterologous vaccine approach to traditional vaccines that require strain-specific vaccine for each pathogen (lines 49-55 of the revised manuscript).

      “Figure 1E. RPM cannot be detected by scRNAseq?”

      The reviewer is correct. we excluded RPMs from the scRNA-seq analysis. As we discuss in the manuscript (lines 94-96), and in our previous publication (PMID: 34788598), RPM activation involves rapid cell death. As we are analyzing by scRNA-seq two weeks after BCG challenge, we only measured scRNA-seq of CD11b+ cells, which exclude RPMs, as we were worried that our transcriptional data would represent transcriptional signatures of dying cells, making interpretation of the data difficult.

      “Figures 1H and I. The CM-T macrophages are not represented? Are they contemplated within the CM population? Would be useful to see the contribution of CM-T to the total CM DEGs/pathways.”

      The reviewer is correct. CM-T cells are evident only after BCG challenge. Because of this, our analysis of DEGs induced in monocytes by BCG requires analysis of all monocytes together. Thus, we were careful throughout the manuscript to refer to CM when analyzing bulk RNA-seq data.

      “Lines 104-117. Can the authors summarise or move the text in this paragraph to discussion? Although it provides important context, it cuts the line of thought and reduces comprehension of this section. “

      OK – we moved this section to the discussion in the revised manuscript.

      “Line 127. Is it Fig 1I or 1F that the authors are referring to? “

      The reviewer is correct, and we changed the text in the revised manuscipt accordingly.

      “Figure 1J. x-axis labels CM cells but both text and figure legend refer to this panel as CM-T. If this is the case, please show data for CM and CM-T separately.”

      Please see our earlier point above that limits these analyses. As such we have also edited the text and figure legend to reflect this.

      “Lines 136-139. Please indicate that this can be found in Fig 1J.”

      OK – indicated in the revised manuscript

      “Line 152. Please add that STm infection occurred at 14 and 60 days post training.”

      OK – added

      “Lines 162-163. This is repeated from lines 89-90, maybe the reduction of RPMs can be only highlighted in this section so that the previous section can be just focused on the new CM-T population?”

      The reviewer is correct - we removed the mention of RPMs here, and mention them only later in the revised manuscript.

      “Line 163. The recruitment is CM or CM-T cells? Since they express CXCL9 (line 165 and Fig1J) could this be used as a marker for the CM-T population at this time point?”

      The reviewer is correct, and we thank him for this important comment. We now indicate that CXCL9+ is a marker for the CM-Ts population here and throughout the revised manuscript (lines 153-155 of the revised manuscript).

      “Line 173. The loss of CXCL9 at 60 dpi means that CM-T population disappears/reduces or returns to CM only? If the population is reduced, could it be related to the reduced STm infection control at 60 days?”

      OK– done. Referred to these cells as CM-Ts and suggested a correlation with protection loss in the text (lines 160-162 of the revised manuscript).

      “Figure 2D. Can the authors show if there is variation in the myeloid populations after PBS injection at different time points? Are the percentages shown only at 3 dpi? It is curious that at 30 dpi the transcriptome has a significant change for certain genes.”

      There are indeed variations across the PBS time points samples, which we demonstrate in Figure S2B. The percentages shown in the main figure for PBS reflect the mean of all time points, this is now stated in greater clarity in the revised manuscript (lines 151-152). We also noted an increase in the cell cycling genes at D30 for the control mice as well, and while still significant in BCG, we limited interpretation accordingly.

      “Line 208. The authors can highlight that the expression of STAT1 follows the same pattern as IFNg. Maybe even present the graphs side by side?”

      The reviewer is correct, and we have implemented their suggestion as such in the updated text (lines 192-195) and figure (Fig. 2H).

      “Line 213. Authors mention a replenishment of the RPM population - what time point are you referring to? At 60 dpi the population seems to be halved compared to 14 dpi. Later (line 230), authors refer to the replenishment as a repopulation by other cell types - is repopulation more correct than replenishment?”

      The reviewer is correct, and we thank the reviewer for this important comment. We now changed replenishment to repopulation (lines 95, 201), which is more accurate given the continued decreased percentage at later time points.

      Lines 214-222. It is not clear what is the conclusion from these experiments: is the recruitment of progenitors from the BM or by local signals?

      The reviewer is correct, we agree that the wording in the initial manuscript was imprecise. This experiment specifically tests whether trained bone marrow progenitors can sustain the observed TI signatures in a naive environment. By transplanting trained bone marrow into naive hosts, we demonstrate that progenitor programming alone is sufficient to maintain long-term SCA-1 expression in NCMs, without requiring ongoing local tissue signals. We now better clarify this text in the revised manuscript (lines 202-212).

      “Line 333-334. Where is the data that shows that upon Fedratinib RPMs have enhanced survival?”

      OK – We now indicate the figure in the revised manuscript.

    1. eLife Assessment

      This work provides a valuable contribution to our understanding of the neurobiological mechanisms underlying spatial memory and learning, suggesting that dopamine plays a pivotal role in linking reward context and novelty to memory consolidation processes. The evidence presented to support the main conclusions is solid, although reviewers felt that the strength of evidence could have been further strengthened by more rigorous histological verification of the experimental conditions and the complexity of the experimental manipulations, increased sample sizes, and a more consistent approach to experimental dosing and timing, which will be crucial for confirming the reproducibility and reliability of the observed effects.

    2. Reviewer #1 (Public review):

      This manuscript by Kleinman & Foster investigates the dependence of hippocampal replay on VTA activity. They recorded neural activity from the dorsal CA1 region of the hippocampus while chemogenetically silencing VTA dopamine neurons as rats completed laps on a linear track with reward delivery at each end. Reward amount changed across task epochs within a session on one end of the track. The authors report that VTA activity is necessary for an increase in sharp-wave rate to remain localized to the feeder that undergoes a change in reward magnitude, an effect that was especially pronounced in a novel environment. They follow up on this result with a second experiment in which reward magnitude varies unpredictably at one end of the linear track and report that changes in sharp-wave rate at the variable location reflect both the amount of reward rats just received there, in addition to a smaller modulation that is reminiscent of reward prediction error coding, in which the previous reward rats received at the variable location affects the magnitude of the subsequent change in sharp-wave rate that occurs on the present visit.

      This work is technically innovative, combining neural recordings with chemogenetic inactivation. The question of how VTA activity affects replay in the hippocampus is interesting and important given that much of the work implicating hippocampal replay in memory consolidation and planning comes from reward-motivated behavioral tasks.

      Comments on revisions:

      Overall, I think the authors have done everything they could to address reviewer concerns, short of collecting more data. The more consistent statistical approach makes the paper easier to read and follow. It's helpful to have more details/rationale for the variability in CNO dose and timing. I think some of the results are still not fully convincing, especially the reward volatility experiment (which the authors also note requires additional validation). Given the small number of rats, the small effect sizes, and the complexity of the experimental manipulations, I still have concerns about whether these effects would hold with larger groups sizes.

    3. Reviewer #2 (Public review):

      (1) Summary<br /> Kleinman and Foster's study investigates the role of dopamine signaling in the ventral tegmental area (VTA) on hippocampal replay and sharp-wave ripples (SWR) in rats exposed to changes in reward magnitude and environmental novelty. The authors utilize chemogenetic silencing techniques to modulate dopamine neuron activity in the VTA while conducting simultaneous electrophysiological recordings from the hippocampal CA1 region. Their findings suggest that VTA dopamine signaling is critical for modulating hippocampal replay in response to changes in reward context and novelty, with specific disruptions observed in replay dynamics when VTA is inhibited, particularly in novel environments.

      (2) Strengths<br /> The research addresses a significant gap in our understanding of the neurobiological underpinnings of memory and spatial learning, highlighting the importance of dopamine-mediated processes. The methodological approach is robust, combining chemogenetic silencing with precise electrophysiological measurements, which allows for a detailed examination of the neural circuits involved. The study provides important insights into how hippocampal replay and SWR are influenced by reward prediction errors, as well as the role of dopamine in these processes. Specifically, the authors note that VTA silencing unexpectedly did not prevent increases in ripple activities where reward was increased, but induced significant aberrant increases in environments where reward levels were unchanged, highlighting a novel dependency of hippocampal replay on dopamine and a VTA-independent reward prediction error signal in familiar environments. These findings are critical for understanding the consolidation of episodic memory and the neural basis of learning.

      (3) Weaknesses<br /> Despite the strengths in methodology and conceptual framework, the study has several weaknesses that could affect the interpretation of the results. There is a need for more rigorous histological validation to confirm the extent and specificity of viral expression (from all animals ideally), which is crucial for ensuring the accuracy of the findings. Variability in the dosing and timing of chemogenetic interventions could also lead to inconsistencies in the data, suggesting a need for more standardized experimental protocols.

      Comments on revisions:

      I commend the authors for their work in addressing my and the other reviewers' comments. I think these changes have improved the paper, and no further changes are absolutely necessary.

    4. Reviewer #3 (Public review):

      Summary:

      The authors of this work are trying to understand the role dopaminergic terminals coming from VTA have on hippocampal mechanisms of memory consolidation, with emphasis on the replay of hippocampal patterns of activity during periods of consummatory behavior in reward locations. Previous work suggested that replay of relevant spatial trajectories supports reward localization and influences behavior.

      The authors then tried to separate two conditions that were known to cause an increase in replay activity - spatial novelty encoding and variation of reward magnitude - and evaluate how these changed when VTA dopamine neurons were inactivated by a chemogenetic tool. They found that the rate of reverse replay (trajectory going away from the goal location) is increased with reward only in novel, but not in familiar environments. Overall this suggests that the VTA dopamine signal is critical during learning of novel locations, but not during explorations of already familiar environments.

      Strengths:

      The inactivation of VTA projections during goal-oriented behavior and in-vivo analysis of patterns of hippocampal activity during both novelty and reward variability. This work adds to the body of evidence that reverse replay constitutes an important mechanism in learning spatial goal locations. Furthermore, this work also points to the role of VTA in reward prediction error with consequences for spatial navigation and consolidation of spatial memories.

      The authors addressed very carefully all the points raised during the revision and I am very pleased with the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Chemogenetics validation

      Little validation is provided for the chemogenetic manipulations. The authors report that animals were excluded due to lack of expression but do not quantify/document the extent of expression in the animals that were included in the study.

      We thank the reviewer for raising this oversight. We have added additional examples of virus expression in sections from included and excluded animals in Figure 1 – Supplement 1. We also added additional comments on the extent of expression we observed in lines 92-95: “Post-experiment histology confirmed overlapping virus expression and TH-positive neurons in putative VTA near the injection site (-5.6 mm AP from bregma), as well as approximately 0.5 mm anterior and posterior (-5 to -6 mm AP).”

      There's no independent verification that VTA was actually inhibited by the chemogenetic manipulation besides the experimental effects of interest.

      While we did include animals expressing control virus to control for any effect of CNO administration itself, the reviewer is correct that we did not independently verify VTA neurons were inhibited. We have noted this limitation of the current study on lines 513-522 in the Discussion: “We did not directly measure the suppression of VTA neurons after CNO injection. Previous work in other brain areas found hM4Di activation suppressed firing rates to around 60% of baseline (Mahler et al., 2014; Chang et al., 2015), in addition to diminishing synaptic transmission even when spikes occurred (Stachniak et al., 2014). Combined with the incomplete expression of hM4Di in TH-positive neurons in our animals, we expect VTA activity was significantly but not completely suppressed. Because our results depend only on any degree of blunting differences in dopamine release at different reward locations, rather than the total absence of dopamine signaling, measuring the magnitude of suppression was not essential for our conclusions.”

      The authors report a range of CNO doses. What determined the dose that each rat received? Was it constant for an individual rat? If not, how was the dose determined? The authors may wish to examine whether any of their CNO effects were dependent on dose.

      The reviewer is completely correct that we omitted sufficient information regarding the dosage of CNO used in each animal and each session. We have included more details in the Methods lines 676-694, detailing both the doses and the rationale.

      The authors tested the same animal multiple times per day with relatively little time between recording sessions. Can they be certain that the effect of CNO wore off between sessions? Might successive CNO injections in the same day have impacted neural activity in the VTA differently? Could the chemogenetic manipulation have grown stronger with each successive injection (or maybe weaker due to something like receptor desensitization)? The authors could test statistically whether the effects of CNO that they report do not depend on the number of CNO injections a rat received over a short period of time.

      We thank the reviewer for bringing up the question of whether the order of sessions had an influence on the efficacy of CNO in inactivating VTA activity. To address this, we split our dataset in Experiment 1 into two based on what number session of the particular day it was: 1st sessions of the day vs. all subsequent sessions (2nd+ session of the day). Then, we examined the difference in sharp-wave ripple rate between the reward ends in Epoch 2, as in Figure 2D of the manuscript. Though the resulting number of sessions in each split of the dataset is too low to draw strong statistical conclusions, particularly for novel sessions, we see little evidence there is any systematic change in the effect of VTA inactivation as a function of session number in the day. We include this in the revised manuscript as Figure 2 – Supplement 3 and in the Results lines 255-258.

      Motivational considerations

      In a similar vein, running multiple sessions per day raises the possibility that rats' motivation was not constant across all data collection time points. The authors could test whether any measures of motivation (laps completed, running speed) changed across the sessions conducted within the same day.

      We thank the reviewer for this suggestion. We examined behavioral measures of motivation across sessions conducted within the same day. First, we calculated how many total laps each animal completed each session as a function of the session number of the day. In individual animals, this ranged from -2.8 to 4.1 laps per additional session number (mean 2.01), with an average total laps per session of 43.2 laps. Second, we calculated the median running velocity per session, across both running directions and all epochs, and checked how it varied across session number of the day. Per additional session in the day, this ranged from -3.6 to 8.6 cm/s difference across animals (mean 2.7 cm/s), with an average running velocity of 34.1 cm/s in total. Taken together, while we found little behavioral evidence of strong motivational changes across session, our animals may have been slightly more motivated in later sessions in the day, which also corresponded to later in the light cycle and closer to the dark cycle. We mention this information in Results lines 255-258, related to Figure 2 – Supplement 3.

      This is a particularly tricky issue, because my read of the methods is that saline sessions were only conducted as the first session of any recording day, which means there's a session order/time of day and potential motivational confound in comparing saline to CNO sessions.

      We have clarified the ordering of CNO and saline sessions in the Methods lines 697-702. Briefly, we avoided running CNO sessions before saline sessions in the same day, but either could be the first session of a day. That is, saline -> saline, saline -> CNO, and CNO -> CNO were all valid orderings. On days with more than two sessions, any number of repeated saline and CNO sessions was permitted, provided that as soon as a CNO session occurred, any subsequent sessions were also CNO.

      More generally, we shared this reviewer’s concern about potential confounds between drug and motivation. For novel sessions in Experiment 1, each animal had equal numbers of saline and CNO 1st and 2nd sessions of the day. For familiar sessions, animals had similar counts for 1st sessions of the day (experimental rats: 20 saline, 16 CNO; control rats: 17 saline, 15 CNO) but more CNO 2nd sessions of the day (experimental rats: 5 saline, 13 CNO; control rats: 5 saline, 10 CNO). There were occasionally 3rd or 4th sessions in a given day for some rats, and these were also approximately equal (experimental rat 2, 3rd sessions: 2 each of saline and CNO, 4th session: 1 saline; experimental rat 3 and 4, 3rd sessions: 1 each of saline and CNO; control rat 2, 3rd session: 1 saline).

      Statistics, statistical power, and effect sizes

      Throughout the manuscript, the authors employ a mixture of t-tests, ANOVAs, and mixed-effects models. Only the mixed effects models appropriately account for the fact that all of this data involves repeated measurements from the same subject. The t-tests are frequently doubly inappropriate because they both treat repeated measures as independent and are not corrected for multiple comparisons.

      We thank the reviewer for pointing out these issues with our statistical analyses in places. We have made the following improvements:

      Figure 1F-I, S1, reward end visit durations: We now use a linear mixed-effects model to analyze the difference in stopping period durations between epochs. For each session, we calculated the mean stopping duration for each reward end in each epoch, then modeled the difference between epochs as a function of drug and novelty, with animal-specific intercepts. For example, related to Figure 1F and also described in the Results, we modeled the stopping duration difference at the Unchanged reward end, Epoch 2 – Epoch 1, and found experimental rats had a significant intercept (Epoch 2 stops shorter than Epoch 1) and the drug × novelty interaction, while control rats had a significant intercept and novelty main effect. The other visit duration analysis shown in Figure 1 – Supplement 1 have similarly been updated.

      Figure 2D-E, ripple rate difference between reward ends in Epoch 2: We now use a linear mixed-effects model to analyze the difference between ripple rates at the Incr. and Unch. reward ends in Epoch 2. For each session, we calculated the mean ripple rate at each end in Epoch 2, then modeled the difference as a function of drug and novelty, with animal-specific intercepts. With the full stopping periods, for experimental rats, there was a significant intercept (ripple rate at Incr. greater than Unch.) and the model with drug included performed significantly better than the one without it (AIC<sub>nodrug</sub> – AIC<sub>full</sub> = 5.22). Control rats had a significant intercept and effect of novelty (greater difference with novelty), and the model excluding drug terms performed better (AIC<sub>nodrug</sub> – AIC<sub>full</sub> = -3.54). Results with the trimmed stopping periods were similar. These analyses are described in Results lines 253-266.   

      Figure 3D-E, ripple rate as a function of reward history: We now use a mixed-effects model that incorporates animal-specific intercepts. The results remained similar and have been updated in the text and legend.

      Figure 4D-K, replay rates as a function of drug, novelty, and directionality: We now use mixed-effects models that incorporate animal-specific intercepts rather than three-way ANOVA. The results remained similar and have been updated in the text and legend.

      The number of animals in these studies is on the lower end for this sort of work, raising questions about whether all of these results are statistically reliable and likely to generalize. This is particularly pronounced in the reward volatility experiment, where the number of rats in the experimental group is halved to just two. The results of this experiment are potentially very exciting, but the sample size makes this feel more like pilot data than a finished product.

      We have added additional emphasis in the text that the experimental group results of CNO inactivation in the volatile reward task should be confirmed with future work, in Discussion line 529-533. Because these experiments were performed on familiar tracks, we see them as corroborating/complementing the results from Experiment 1. Although the analysis assumes VTA inactivation had no effect, our pooling of all Experiment 2 data to display in Figure 3 – Supplement 2 maximized our ability to analyze the effects of volatile reward deliveries on sharp-wave ripple rates, lending further support to the main results shown in Figure 3.

      The effect sizes of the various manipulations appear to be relatively modest, and I wonder if the authors could help readers by contextualizing the magnitude of these results further. For instance, when VTA inactivation increases mis-localization of SWRs to the unchanged end of the track, roughly how many misplaced sharp-waves are occurring within a session, and what would their consequence be? On this particular behavioral task, it's not clear that the animals are doing worse in any way despite the mislocalization of sharp-waves. And it seems like the absolute number of extra sharp-waves that occur in some of these conditions would be quite small over the course of a session, so it would be helpful if the authors could speculate on how these differences might translate to meaningful changes in processes like consolidation, for instance.

      We thank the reviewer for this helpful suggestion to give some context to the difference in sharp-wave ripple numbers and the functional consequence of these changes. We agree completely that this task is almost certainly too simple for animals to show any performance deficit from these changes. We chose this precisely so we could examine the consequences of VTA inactivation to the sharp-wave ripple response to reward changes per se, without any confound of performance or memory changes that could also conceivably alter sharp-wave ripples. We have added both more context about the magnitude and consequence of these sharp-wave ripple changes as well as comments about the choice of this particular task (Discussion lines 522-529).  

      How directly is reward affecting sharp-wave rate?

      Changes in reward magnitude on the authors' task cause rats to reallocate how much time they spent at each end. Coincident with this behavioral change, the authors identify changes in the sharp-wave rate, and the assumption is that changing reward is altering the sharp-wave rate. But it also seems possible that by inducing longer pauses, increased reward magnitude is affecting the hippocampal network state and creating an occasion for more sharp-waves to occur. It's possible that any manipulation so altering rats' behavior would similarly affect the sharp-wave rate.

      For instance, in the volatility experiment, on trials when no reward is given sharp-wave rate looks like it is effectively zero. But this rate is somewhat hard to interpret. If rats hardly stopped moving on trials when no reward was given, and the hippocampus remained in a strong theta network state for the full duration of the rat's visit to the feeder, the lack of sharp-waves might not reflect something about reward processing so much as the fact that the rat's hippocampus didn't have the occasion to emit a sharp-wave. A better way to compute the sharp-wave rate might be to use not the entire visit duration in the denominator, but rather the total amount of time the hippocampus spends in a non-theta state during each visit. Another approach might be to include visit duration as a covariate with reward magnitude in some of the analyses. Increasing reward magnitude seems to increase visit duration, but these probably aren't perfectly correlated, so the authors might gain some leverage by showing that on the rare long visit to a low-reward end sharp-wave rate remains reliably low. This would help exclude the explanation that sharp-wave rate follows increases in reward magnitude simply because longer pauses allow a greater opportunity for the hippocampus to settle into a non-theta state.

      We thank the reviewer for these important comments. We have better clarified the analysis of sharp-wave ripple rate in the Results (lines 172-173). To speak to the main concern of the reviewer, we do only consider times during “stopping periods” when the rat is actually stationary. That is, ripple rate for each visit is calculated as (# of ripples / total stationary time), rather than the full duration the rat is at the track end. With respect to including visit duration as a covariate, the Poisson model takes the total stationary time of each visit into account, so that it is effectively predicting the number of events (ripples) per unit of time (seconds) given the particular experimental variables (reward condition, drug condition, etc.). We have added additional clarification of this in the Methods (line 834-836).

      The authors seem to acknowledge this issue to some extent, as a few analyses have the moments just after the rat's arrival at a feeder and just before departure trimmed out of consideration. But that assumes these sorts of non-theta states are only occurring at the very beginning and very end of visits when in fact rats might be doing all sorts of other things during visits that could affect the hippocampus network state and the propensity to observe sharp-waves.

      We hope that with the clarification provided above, this control analysis helps remove any potential effects of approaching/leaving behavior or differences in movement at the reward end that could alter sharp-wave ripple rates. 

      Minor issues

      The title/abstract should reflect that only male animals were used in this study.

      We have added this important information to the Abstract line 21.

      The title refers to hippocampal replay, but for much of the paper the authors are measuring sharp-wave rate and not replay directly, so I would favor a more nuanced title.

      We thank the reviewer for this suggestion. In the context of our work, we consider sharp-wave ripples as more-easily-detected markers for the occurrence of replay. Previous work from our lab (Ambrose et al., 2016) showed the effect of reward changes had very similar effects to both sharp-wave ripple rate and replay rate. We try to be explicit about viewing ripples as markers of replay content in both the Introduction and Discussion. Nevertheless, we do also demonstrate the title claim directly – by measuring replay and its spatial localization – therefore we feel comfortable with the title as it is.

      Relatedly, the interpretation of the mislocalization of sharp-waves following VTA inactivation suggests that the hippocampus is perhaps representing information inappropriately/incorrectly for consolidation, as the increased rate is observed both for a location that has undergone a change in reward and one that has not. However, the authors are measuring replay rate, not replay content. It's entirely possible that the "mislocalized" replays at the unchanged end are, in fact, replaying information about the changed end of the track. A bit more nuance in the discussion of this effect would be helpful.

      While we do show that replay content, in the form of reverse vs. forward replays, is altered with VTA inactivation, we take the reviewers point and completely agree. Especially in the context of the linear track, replays at either end could certainly be updating/consolidating information about both ends. We would argue our results suggest VTA is critical to localizing ripples and replay in more complex environments where this is not the case, but this is a hypothesis. We have added clarification and discussion of this point (Discussion lines 522-529).

      However, in response to the reviewer’s comment, we have now also examined non-locally-initiated replays specifically to determine whether the increased ripple rate at the Unch. reward end in novel CNO sessions was likely due to more non-local replay, but found no significant increases in non-local replay at either reward end in either drug condition or novelty condition. We have included this result as Figure 4 – Supplement 3, and note it in the Results lines 487-488.

      The authors use decoding accuracy during movement to determine which sessions should be included for decoding of replay direction. Details on cross-validation are omitted and would be appreciated. Also, the authors assume that sessions failed to meet inclusion criteria because of ensemble size, but this information is not reported anywhere directly. More info on the ensemble size of included/excluded sessions would be helpful.

      We have added additional information about the run decoding procedure and related session inclusion criteria, as well as about recorded ensemble sizes (lines 417-421). Briefly, mean ensemble sizes were significantly smaller for excluded sessions (cell count, mean±sem; included sessions: 26.1±1.1, excluded sessions: 9.5±1.6; two-sample t-test, t(133)=5.3, p<10<sup>-5</sup>). The average field size, defined as the number of spatial bins with greater than 1 hz firing rate, in excluded sessions was also larger (mean±sem, included sessions: 47.7±1.3, excluded sessions: 57.7±5.8; two-sample t-test, t(133)=-2.33, p<0.05), though the difference was less dramatic. Using a mixed effects model to predict position decoding error (as in Figure 4 – Supplement 2A) as a function of drug, novelty, cell count, and mean place field size, in both experimental and control groups cell count and field size were significant predictors: more cells and smaller average field size led to lower error. A similar model that instead predicted the fraction of running bins with correctly decoded running direction (as in Figure 4 – Supplement 2B), in neither group was field size significant, while cell count remained so: more cells led to more bins with running direction correctly classified. We include these analyses in the legend for the figure. With respect to cross validation of run decoding, because both the contribution of spikes in any single time bin to a neuron’s place field is extremely small and because we used run decoding accuracy simply to filter out sessions with poorer decoding, we did not use cross validation here.

      For most of the paper, the authors detect sharp-waves using ripple power in the LFP, but for the analysis of replay direction, they use a different detection procedure based on the population firing rate of recorded neurons. Was there a reason for this switch? It's somewhat difficult to compare reported sharpwave/replay rates of the analyses given that different approaches were used.

      We have added clarification for this change in detecting candidate events (lines 787-789). Briefly, sharp-wave ripples and spike density events are often but not always overlapping, such that there can be strong ripples with little spiking in the recorded ensemble or weak/absent ripples during vigorous spiking in the recorded ensemble. Because the decoding of replay content relies on spiking, our lab and others often use spike density or population burst events as candidate events. We have confirmed that the main results of Experiment 1 (e.g., Figure 2) remain the same if we use spike density events rather than sharp-wave ripples, but prefer to keep the use of sharp-wave ripples here for better comparison with Experiment 2 and to allow the inclusion of animals and sessions with low cell yield but clear ripples in the LFP.  

      Reviewer #2 (Recommendations For The Authors):

      Include additional histological data to confirm the extent of viral spread and precise tetrode placements. Providing detailed figures that clearly illustrate these aspects would strengthen the validity of the neural recordings and the specificity of the chemogenetic silencing.

      We thank the reviewer for this suggestion and have added additional information regarding virus expression in Figure 1 – Supplement 1. We also added additional comments on the extent of expression we observed in lines 92-95: “Post-experiment histology confirmed overlapping virus expression and TH-positive neurons in putative VTA near the injection site (-5.6 mm AP from bregma), as well as approximately 0.5 mm anterior and posterior (-5 to -6 mm AP).”

      While we do not show histological confirmation of hippocampal recording sites, the presence of sharp-wave ripples with upward deflections, presence of place cells, and recording coordinates and depth typical of dorsal CA1 made us confident in our recording location. We have noted these characteristics of our recordings in lines 128-131 in the Results: “Tetrodes were lowered to the pyramidal cell layer of dCA1, using the presence of sharp-wave ripples with upward deflections in the LFP, recording depth characteristic of dCA1, and spatially-restricted firing of place cells to confirm the recording location.”

      Address the variability in CNO dosing and timing before recordings. It is recommended to standardize the dose and ensure a consistent timing interval between CNO administration and the start of recordings to minimize variability in the effects observed across different subjects. Instead of collecting new data, the authors could report the data for each animal, indicating the dose and interval between the injection and the recording.

      We have further clarified the CNO dosing and timings in lines 676-702.

      In Figure 1F, explicitly state whether the data represent averages across multiple sessions and confirm if these observations are primarily from the initial novel sessions. This clarification will help in accurately interpreting the effects of novelty on the measured neural activities.

      We have changed the analyses shown in Figure 1F-I and Figure 1 – Supplement 1 thanks to the suggestions of Reviewer #1, but also more clearly spell out the analysis. Briefly, we average the durations for each condition within session (e.g., take the mean Unch. duration in Epoch 1), then perform the analysis across sessions. These data come from all sessions in Experiment 1, as described in lines 141-147, meaning there are around 2-3 times as many familiar sessions as novel sessions.

      Reconsider the reporting of marginal p-values (e.g., p=0.055). If the results are borderline significant, either more data should be collected to robustly demonstrate the effects or a statistical discussion should be included to address the implications of these marginal findings.

      We have removed the reporting of marginal p-values.

      Ensure that the axes and scales are consistent across similar figures (specifically mentioned for Figure 2A) to prevent misinterpretation of the data. Consider showing the average across all animals in 2A, similar to 2B and 2C.

      We have adjusted these axes to be consistent across all panels.

      Add a legend to the heatmap in Figure 4A to facilitate understanding of the data presented.

      We have added a heatmap to the figure and legend.

      Provide a detailed examination and discussion of the apparent contradictions observed in control data, particularly where experimental conditions with saline show increased reverse replay in novel environments, which is absent in familiar sessions. See Figures 4E and 4I.

      We thank the reviewer for noting that this feature of our data deserved discussion. We confirmed that the lack of an effect of reward on reverse replay rates in familiar sessions in control rats was due to generally low replay rates in these sessions. Replay rates have been observed to decrease as the familiarity of an environment or behavior increases, and the presence of the reward-related modulation of reverse replay in novel sessions in these animals is consistent with this observation. We now report in the Results lines 458-459 and 485-486 the low replay rates in this group in familiar sessions, and the likelihood that this is preventing any reward-related modulation from being detected.

      Include a more detailed analysis of place cell properties, such as firing rates and field sizes, especially in novel environments where VTA inactivation appears to alter spatial coding. Decoding error is lower during CNO administration - does this mean place fields are smaller/more accurate? This analysis could offer deeper insights into the mechanisms by which dopamine influences hippocampal neural representations and memory processes.

      We thank the reviewer for this helpful suggestion. We have expanded on our analysis of place field properties and decoding accuracy, describing properties of sessions with good enough decoding to be included compared to those that were excluded (lines 417-421). We also directly tested how decoding quality depended on several factors, including drug condition, novelty, number of cells recorded, and the average place field size of recorded cells (see legend for Figure 4 – Supplement 2). We found a small but significant effect of drug in experimental rats, but larger effects of number of recorded cells and average field size, that were also present in control animals.

      Correct the typo on line 722 from "In ANOVA" to "An ANOVA".

      We reworded this section and have corrected this error.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript is clear and exciting. As a main criticism, I would have liked to see the effects on ripple duration not just the rate.

      We thank the reviewer for this interesting idea. We performed a new analysis, similar to our analysis on SWR rate, probing the effect of our experimental manipulations on SWR duration in experimental rats. We have added the results in Figure 2 – Supplement 4, and note them in the main text lines 195-198: “SWR duration was reduced in novel sessions, consistent with replays becoming longer with increased familiarity  (Berners-Lee et al., 2021), as well as in Epoch 2, but was otherwise unaffected by reward or drug (Figure 2 – Supplement 4).”

      I have a few other minor comments:

      (1) I find it a little disturbing and counterintuitive that statistical differences are not always depicted in the figure graphs (for example Figures 2A-E). If the authors don't like to use the traditional *, ** or *** they could either just use one symbol to depict significance or simply depict the actual p values.

      We thank the reviewer for this suggestion. We struggled with indicating significance values graphically in an intuitive way for interaction terms in the figures. We now added significance indicators in Figures 1F-I, added the significant model coefficients directly into Figure 2B-C, changed the analysis depicted in Figure 2D-E per Reviewer 1’s suggestions, and added significance indicators where previously missing in Figures 3 and 4.

      (2) Related to the point above: in the page 7 legend D and E, it would be advantageous for clarity of the experimental results to also perform post-hoc analyses as depicted in the graphs, rather than just describe the p-value of the 3way ANOVA;

      We thank the reviewer for this suggestion. Because the figure includes the mean and standard error of each condition, in addition to the significant effects of the mixed-effects model, we prefer the current format as it makes clearer the statistical tests that were performed while still allowing visual appreciation of differences between specific experimental conditions of interest to the reader.

      (3) According to Figure 1H, the duration of the reward visits can go up to 15s (or more). Yet in Figure 2A only the first 10sec were analyzed. While I understand the rationale for using the initial 10 seconds where there is a lot more data, the results of graphs of Figures A to C will not have the same data/rate as Figures D-F where I assume the entire duration of the visit is taken into account.

      A figure showing what happening to the ripple rate during the visits >10sec would help interpret the results of Figure 2.

      We thank the reviewer for these interesting suggestions. We clarify now that all these analyses of Experiment 1 use only the first 10 s of each stopping period in Method line 758-764. However, examining the longer stopping periods is an excellent suggestion, and we re-analyzed the Experiment 1 dataset using up to the first 20 s of each stopping period. The main results (e.g., Figure 2) remain the same:

      (1) Related to Figure 2B-C: For experimental rats, a mixed-effects generalized linear model predicting sharp-wave ripple rate as a function of reward end, block, drug, novelty, and interactions, had the following significant terms: drug (p<10<sup>-5</sup>), novelty (p<10<sup>-10</sup>), reward end × block (p<10<sup>-10</sup>), and reward end × block × drug (p<0.05). The same model in control rats had significant terms: reward end (p<0.05), novelty (p<10<sup>-4</sup>), reward end × block (p<10<sup>-10</sup>).

      (2) Related to Figure 2D-E: For experimental rats, we used a mixed-effects generalized linear model predicting the difference in sharp-wave ripple rate between the Incr. and Unch. reward ends in Epoch 2 as a function of novelty, drug, and their interaction. Model comparison found the full model performed better than a model removing the drug terms (AIC<sub>nodrug</sub> – AIC<sub>full</sub> = 2.94), while a model with only the intercept performed even worse (AIC<sub>intercept</sub> – AIC<sub>full</sub> = 13.76). For control rats, model comparison found the full model was equivalent to a model with only the intercept (AICintercept – AICfull = -0.36), with both modestly better than a model with no drug terms (AIC<sub>nodrug</sub> – AIC<sub>full</sub> = 1.38).

      We have added a remark that results remain the same using this longer time window in Methods line 758-764.

    1. eLife Assessment

      This important work uses an innovative approach to understand similarities between haemodynamic and electrophysiological activity of the human brain. The study provides incomplete evidence to indicate that while similar functional brain networks are used in both modalities, there is a tendency for these multi-modal networks to spatially converge at synchronous rather than asynchronous time points. This work will be of interest to neurophysiological and brain imaging researchers.

    2. Reviewer #1 (Public review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar networks configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous timepoints. However, my confidence in the findings and their interpretation is undermined by an incomplete justification for the expected outcomes for each of the proposed scenarios.

      Main Concern

      Fig 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.:

      - Scenario1. Temporally convergent, same trajectories through connectome state space<br /> - Scenario2. Temporally divergent, different trajectories through connectome state space

      However, based on my understanding (and apologies if I am mistaken), I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in fig 2C, or the statements in the main text, i.e.:

      - For scenario1, "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"<br /> - For scenario2, "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in fig2C from? They do not seem to be fully justified via previous literature, theory, or simulations?

      In particular, I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points. It seems to me that you would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenario 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, the authors could use surrogate data for scenario 1 and 2. For example:

      a. For scenario 1, run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).<br /> b. For scenario 2, run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simple splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      The authors have provided CRP plots for option a. It shows a CRP, as expected, consistent with scenario 1. This is a useful sanity check. However, as mentioned above, it does not ensure that all CRPs under this scenario will look like this.

      However, the authors have not shown a CRP as per option b. As such, there is an incomplete justification for the expected outcomes of the scenarios.

      Note that another option, which has not been carried out, is to use full simulations, with clearly specified assumptions, for scenario1 and 2. One way of doing this is to use a simplified (state-space) setup where you randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      Using this, you can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities<br /> b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      Minor Concern

      Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." It is great that the authors provide both. However, given that FC in EEG is almost totally dominated by spatial leakage (see Hipp paper), the main results/figures for the scalp EEG should be done using spatial leakage corrected EEG data.

    3. Reviewer #2 (Public review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the timepoints were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involved a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      The authors also did a control analysis and verified the effect of temporal window size or different functional connecvitity operationalizations. I also applaud their effort to make the analysis code open-sourced.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar network configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous time points. However, my confidence in the findings and their interpretation is undermined by an apparent lack of justification for the expected outcomes for each of the proposed scenarios, and in the analysis pipeline itself.

      Main Concerns

      (1) Figure 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.

      Scenario 1: Temporally convergent, same trajectories through connectome state space

      Scenario 2: Temporally divergent, different trajectories through connectome state space

      However, based on my understanding I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in Figure 2C, or the statements in the main text:

      For Scenario 1: "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      For Scenario 2: "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in Figure 2C from? Are they based on previous literature, theory, or simulations?

      I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points? It seems to me that one would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless, a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenarios 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, a couple of ways I can think of to gain insight into this key aspect are:

      (1) Use surrogate data for scenarios 1 and 2:

      a. For scenario 1: Run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      b. For scenario 2: Run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simply splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      (2) Do simulations, with clearly specified assumptions, for scenarios 1 and 2. One way of doing this is to use a simplified (state-space) setup and randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      The authors would only need to worry about simulating the network activation time courses, i.e. they would not need to bother with specifying the spatial configuration of each network, instead, they would make the implied assumption that each of these networks has the same spatial configuration in modality 1 and modality 2.

      With that assumption, the CRP calculation should simply correspond to calculating, at each time i in modality 1 and time j in modality 2, the number of networks that are activating in both modality 1 and modality 2, by using their activation time courses. Using this, one can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We thank the reviewer for raising this important matter as it directly relates to our study hypothesis. To address this point, we chose to focus on the first of the two alternative suggestions of the reviewer, as it provides evidence based on empirical data. In line with the reviewer’s suggestion 1, recurrence plots have indeed been previously applied to connectome dynamics data from the same modality [Hansen et al., NeuroImage 2015; Fig. 2B]. As shown in the referenced study, where the recurrence plot has been estimated within fMRI connectome dynamics, the on-diagonal entries have noticeably larger correlation values in comparison to off-diagonal entries. As the authors state, this contrast emphasizes the autocorrelation of connectome dynamics in their single modality recurrence plot. Extending these findings to our cross-modal recurrence plots, more synchronicity of connectome dynamics across fMRI and EEG will -by theory- translate into stronger correlation values along the diagonal axis as it represents neighboring timepoints in the data. On the other hand, less cross-modal synchronicity translates to a lack of such correlation prevalence along the diagonal axis.

      Complementing these statements with empirical data, Author response image 1 shows the fMRI-to-iEEG and fMRI-to-fMRI CRPs side by side as suggested by the reviewer. For simplicity, we thresholded each CRP at the top 5% of entries and calculated their corresponding on-/off-diagonal ratios. The on/off-diagonal ratio for fMRI-to-fMRI CRP was 4.32 ± 6.26 across -5 to +5 TR lags (with a maximum of 16.56 at a lag of one TR), while this value was 1.00 ± 0.31 for fMRI-to-iEEG CRP. Thus, it becomes apparent that synchronicity of connectome dynamics directly translates to the on-/off-diagonal ratio in CRP.

      Author response image 1.

      Sample CRP shown for a subject for comparing two cases: fMRI-to-iEEG (left) and fMRI-to-fMRI (right). The comparison shows that in the presence of genuine synchronous connectome dynamics, as expected for the within-molality case (right panel), the on-/off-diagonal ratio is expected to show noticeably higher values. This figure establishes a strong link between our proposed metric of on-/off-diagonal ratio and the extent of synchronicity of connectome dynamics.

      Author response image 2.

      On-/off-diagonal ratio in the fMRI-to-fMRI recurrence plot is considerably higher than the cross-modal fMRI-to-iEEG case. Horizontal axis shows the lag where the metric was calculated in the CRP. The bars reflect the group average metric while the whickers show standard deviation. Note that for the within-modality case, ratio is not defined at lag zero because of identical connectome frames.

      (2) Choices in the analysis pipeline leading up to the computation of FC in fMRI or EEG will affect the quality of information available in the FC. For example, but not only, the choice of parcellation (in the study, the number of parcels is very high given the number of EEG sensors). I think it is important that we see the impact of the chosen pipeline on the time-averaged connectomes, an output that the field has some idea about what is sensible. This would give confidence that the information being used in the main analyses in the paper is based on a sensible footing and relates to what the field is used to thinking about in terms of FC. This should be trivial to compute, as it is just a case of averaging the time-varying FCs being used for the CRP over all time points. Admittedly, this approach is less useful for the intracranial EEG.

      We agree with the reviewer on ensuring that the time-averaged FC aligns with expectations of the field and prior work. For this reason, our supplementary analysis already included an analysis that replicates the well-established (albeit modest) spatial similarity between fMRI static connectome and EEG/iEEG static connectomes:

      “In scalp EEG-fMRI data, cross-modal spatial (2D) Pearson correlation of group-level time-averaged connectomes between fMRI and EEG-FCAmp or fMRI and EEG-FCPhase were calculated across all frequency bands. The average spatial correlation value across frequency bands r = 0.28 and r = 0.28 for EEG-FCAmp and EEG-FCPhase, respectively. The spatial correlation values across all frequency bands and connectivity measures were significantly higher than the corresponding null distributions generated by phase-permuted group-level fMRI-FC spatial organization (p<0.005; 200 repetitions; FDR-corrected at q<0.05 for the number of frequency bands). …. Of note, the small effect sizes are strongly in line with prior literature (Hipp and Siegel, 2015; Wirsich et al., 2017; Betzel et al., 2019) and may point to possible divergence in the dynamic domain as investigated in the main manuscript.”

      This replication directly confirms the validity of our selected atlas for further investigations into the connectome dynamics. We acknowledge that with 64 EEG channels, one can only estimate a relatively coarse connectome. Among the well-known coarse atlases, we chose the Desikan-Killiany atlas as it is based on anatomical features, eliminating possible biases towards a particular functional data modality. Moreover, this atlas has been commonly used for multimodal functional connectivity studies, facilitating the confirmation of prior findings in the time-averaged domain [Deligianni et al. Front. Neurosci 2104, Wirsich et al. NeuroImage, 2020, Wirsich et al., NeuroImage 2021].

      (3) Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." Given that FC in EEG is dominated by spatial leakage (see Hipp paper), then I cannot see how it can be justified to look at non-spatial leakage correction results at all, let alone put them up front as the main results. All main results/figures for the scalp EEG should be done using spatial leakage-corrected EEG data.

      We agree that relying on leakage-uncorrected scalp EEG alone would be problematic. It is for this reason that the intracranial data constructs the core of our results, emphasizing that the observed multiplex architecture of connectomes is indeed present in the absence of source leakage. Only when this finding is established in the intracranial EEG, do we provide the scalp EEG data as a generalization to whole-cortex coverage connectomes of healthy subjects. Moreover, it is known that existing source-leakage correction algorithms may inadvertently remove some of the genuine zero-lag connectivity. For instance, Finger and colleagues have shown that the similarity of functional connectivity to structural connectivity diminishes after correction for source-leakage (Finger et. al, PLOS Comp. Biol. 2016). Therefore, we have deliberately chosen to include our generalization findings before source-leakage correction (main text) as well as after source-leakage correction reflecting a more stringent approach (supplementary analysis). Importantly, our conclusions hold true for both before and after source-leakage correction.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the time points were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involve a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      Weaknesses:

      Despite the impressive work, the paper still lacks some analyses to make it complete.

      Firstly, the effect of the window size is unclear, especially in the case of different frequencies where the number of cycles that fall in a window will vary drastically. A typical oscillation lasts just a few cycles (see Myrov et al., 2024), and brain states are usually short-lived because of meta-stability (see Roberts et al., 2019).

      We now replicate our results with an additional window size. Please see section “Recommendations for the authors”.

      Secondly, the authors didn't examine frequencies lower than 1Hz despite similarities between fMRI and infra-slow oscillations found in prior literature (see Palva et al., 2014; Zhang et al., 2023).

      We address this issue below. Please see section “Recommendations for the authors”.

      On a minor note, the phase-locking value (PLV) is positively biased for EEG data (see Palva et al., 2018) and a different metric for phase coupling could be a more appropriate choice (e.g., iPLV/wPLI, see Vinck et al., 2011).

      While iPLV and wPLI are not positively biased, they may reduce genuine zero-phase connectivity as they were initially designed to address spurious zero-phase connectivity from source leakage in scalp EEG. Indeed, PLV connectivity is shown to be more strongly correlated with structural connectivity than wPLI and other phase coupling methods [Finger et al., PLOS Comp. Biol. 2016], emphasizing that it contains genuine connectivity that may be lacking when zero-phase connectivity is removed. We chose PLV because it is a widely used functional connectivity metric, particularly in intracranial data where source leakage is not a critical concern. Thus, using PLV facilitates cross-study comparisons including to our prior work [e.g. Mostame et al. NeuroImage 2020, Mostame et al. J Neurosci 2021].

      The repository with the code is also unavailable.

      Thank you for bringing this to our attention. We have now made our repository publicly accessible at: https://github.com/connectlab/Mostame2024_Multiplex_iEEG_fMRI.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The window widths used to compute FC as a function of time are an important aspect, so I feel that this should be briefly described up-front in the main Results text.

      Methods. "Finally, to compensate for the time lag between hemodynamic and neural responses of the brain (Logothetis et al., 2001), we shifted the fMRI-FC time course 6 seconds backwards in time." What about the effects of temporal blurring from the HRF? Do we need to care about that?

      We agree with the importance to investigate the effect if temporal blurring of the HRF. The main text already included a replication of findings from CRPs generated using fMRI data and EEG amplitude signals convolved with the canonical HRF. This method serves as an alternative to the 6-second shifting. Both approaches produced similar results.

      Methods. In fMRI connectome computation it is common to look at partial correlation rather than full correlation. Partial correlation focuses more on direct connections. It would be good if the paper acknowledged and justified why it is OK to use full correlation.

      We have now added a brief explanation in this regard in the main text (Methods section) as follows:

      “In fMRI connectome computation, some prior work has used partial correlation instead of full correlation. Partial correlation emphasizes direct connections by calculating correlation between any pair of bran regions after regressing out the timeseries of all other regions. However, we have opted to use full correlation because this permits interpretation of our outcomes in the context of the vast existing literature that uses full correlations in fMRI including the majority of bimodal (EEG-fMRI) connectome studies (e.g. Tagliazucchi et al., 2012; Deligianni et al., 2014; Wirsich et al., 2017b, 2020, 2021; Allen et al., 2018).”

      The paper should relate the results to findings showing clear links between simultaneously recorded EEG and fMRI beyond FC. E.g. Mantini (PNAS) 2007 and Van De Ville (PNAS) 2010 to name two.

      In line with this important point, we have extended the existing discussion section that compares our outcomes to EEG-fMRI beyond functional connectivity:

      “Prior multi-modal studies of neural dynamics have predominantly aimed at methodologically cross-validating hemodynamic and electrophysiological observations, thus focusing on their convergence. These important foundational studies include e.g., the cross-modal comparison of region-wise (Mukamel et al., 2005; Nir et al., 2007) or ICN-wise (Mantini et al., 2007) activity fluctuations, instantaneous activity maps (Hunyadi et al., 2019; Zhang et al., 2020) or EEG microstates (Van de Ville 2010), infraslow connectome states (Abreu et al., 2020), or connection-wise FC including studies in the iEEG-fMRI and scalp EEG-fMRI data used in the current study (Ridley et al., 2017; and Wirsich et al., 2020, respectively). In contrast to this prior work, the current study investigated the highly time-resolved cross-modal temporal relationship at the level of FC patterns distributed over all available pairwise connections, and found a connectome-level temporal divergence. The discrepancy between temporal divergence in our study and convergence in prior studies implies that infraslow fluctuations of activity in individual regions or of FC in individual region-pairs observable in both modalities (prior studies) are neurally distinct from connectome-wide FC dynamics observable separately in each modality (current study). Indeed, we confirmed the existence of infraslow electrophysiological FC dynamics driving cross-modal temporal associations at the level of individual connections (Fig. S3) …”

      Reviewer #2 (Recommendations For The Authors):

      (1) Check different window sizes and stability of the FC patterns as a function of it.

      We thank the reviewer for the helpful feedback. We agree that the window size could possibly affect the estimation of individual connectome frames, particularly given that neural processes unfold at hundreds of milliseconds rather than seconds. However, we expect that the asynchronous nature of cross-modal convergence observed in our data would remain intact regardless of the specific window length used for FC calculations. To confirm this, we replicated some of our main analyses in the iEEG-fMRI data with a window length of 500ms (as opposed to 3s, equivalent to one TR) as follows:

      First, we showed that changing the window length does not substantially impact the overall architecture of the connectomes (Author response image 3). Particularly, the time-averaged connectome patterns across different frequency bands were all strongly correlated between the two analyses (500ms and 3s window lengths).

      Author response image 3.

      Time-averaged connectome patterns are highly replicable when calculated using 3s or 500ms window lengths. Horizontal axis represents frequency bands, while each dot represents a subject. Vertical axis shows 2D Pearson correlation of the two connectomes. The group average within each frequency band is marked by a horizontal line.

      Second, we replicated our major findings of CRP and its on-/off-diagonal ratio in the iEEG-fMRI dataset using a window length of 500ms for FC calculations. Indeed, the data does not show a substantial difference in the on-/off-diagonal ratios of the CRP entries between the 3s and 500ms window lengths. Specifically, the ratio was equal to 1.02 ± 0.07 for 500ms window length, emphasizing absence of significant temporal convergence of the connectome dynamics (see Author response image 4). A paired t-test between group-averaged ratios across different lags confirms a lack of significant difference between the two analyses (p= 0.50). This finding further emphasizes the genuine asynchronous nature of connectome dynamics across the neural timescales measured in fMRI and electrophysiology. We have added this analysis to the supplementary data.

      Author response image 4.

      On-/off-diagonal ratio is shown across lags for both analyses: 3s window length (blue) and 500ms window length (red). Each bar shows the mean across subjects, while the whiskers show the corresponding standard deviations.

      (2) Try to decrease the lowest frequency of the analysis below 1Hz or just compute it for multiple log-spaced frequencies from infra-slow delta to high-gamma band.

      Thank you for pointing out this matter. We do not expect considerable signal in the frequency range below the current lower bound of delta (1Hz) because as in most other EEG recordings, EEG was not recorded in DC setting and has a hardware high-pass filter of 0.1Hz. Nonetheless, we investigated the power spectral density of our iEEG-fMRI data and found that there is indeed little signal power left in the available infraslow range [0.5 – 1 Hz] after the preprocessing steps (Author response image 5).

      Author response image 5.

      Power spectral density of all subjects in the fMRI-iEEG dataset shows lack of sufficient power in the infraslow range. Infraslow range signals are almost always filtered out during recording unless the recording setup includes a DC amplifier. The infraslow signal of EEG that is often considered correlated with the fMRI signals in the literature most commonly are extracted from the slow-changing envelope of the bandlimited signals, like envelope of gamma oscillations.

      Accordingly, when the iEEG signals are filtered within the range of [0.5, 1], there is little signal variation observed in the signal timeseries, contrasting the adjacent delta band signal (Author response image 6). Importantly, the power envelope of the delta band (and all other canonical bands not shown here) comprise major fluctuations in the infraslow range, as expected. We would like to emphasize that the existing studies addressing infraslow EEG signal dynamics typically consider the infraslow envelope fluctuations of band-limited signals in traditional frequency bands [e.g. Nir et. al, Nat Neurosci 2008] rather than direct recordings in the infraslow frequency range. Investigating HRF-convolved EEG signals similarly captures the infraslow characteristics of the timeseries [e.g. Mantini et al. PNAS 2007, Sadaghiani et al., J Neurosci 2010] (and note that HRF-convolved analyses are included as supplementary investigation in the current study). To the best of our knowledge, very few studies have investigated direct infraslow EEG signals using DC EEG, and we are aware of only two DC-EEG studies with concurrent fMRI [Hiltunen et al., J Neurosci 2014, Grooms et al., Brain Connectivity 2017]. The infraslow correlates of fMRI in electrophysiological signals reported in prior work therefore reflect the slow changes in faster activity or connectivity of traditional frequency bands, which is indeed already included in the current study.

      Author response image 6.

      Sample timeseries of the iEEG signal of the nine subjects (nine rows) for a 400 second interval. Blue signals show the bandlimited delta with its envelope shown as darker blue. The red signal represents the infraslow signal component left in the data, which is much lower in power.

    1. eLife Assessment

      This important study offers insights into the function and connectivity patterns of a relatively unknown afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus, suggesting a neural mechanism that suppresses the processing of familiar stimuli in favor of detecting memory guided novelty. The strength of evidence is convincing, with careful anatomical and electrophysiological circuit characterization. The work will be of broad interest to researchers studying the neural circuitry of behavior.

    2. Reviewer #1 (Public review):

      Summary:

      The anatomical connectivity of the claustrum and the role of its output projections has, thus far, not been studied in detail. The aim of this study was to map the outputs of the endopiriform (EN) region of the claustrum complex, and understand their functional role. Here the authors have combined sophisticated intersectional viral tracing techniques, and ex vivo electrophysiology to map the neural circuitry of EN outputs to vCA1, and shown that optogenetic inhibition of the EN→vCA1 projection impairs both social and object recognition memory. Interestingly the authors find that the EN neurons target inhibitory interneurons providing a mechanism for feedforward inhibition of vCA1.

      Strengths:

      The strength of this study was the application of a multilevel analysis approach combining a number of state-of-the-art techniques to dissect the contribution of the EN→vCA1 to memory function.

      In addition the authors conducted behavioural analysis of locomotor activity, anxiety and fear memory, and complemented the analysis of discrimination with more detailed description of the patterns of exploratory behaviour.

    3. Reviewer #2 (Public review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that collateralise to the prefrontal cortex, lateral entorhinal cortex and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      This is an interesting mechanistic study that provides valuable insights into the function and connectivity patterns of afferent input from the endopiriform to the CA1 subfield of the ventral hippocampus. The authors propose that the EN input to the vCA1 interneurons provides a feedforward inhibition mechanism by which memory-based novelty detection could be promoted. The experiments are carefully conducted, and the methodological approaches used are sound. The conclusions of the paper are supported by the data presented.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public review):

      Summary:

      Yamawaki et al., conducted a series of neuroanatomical tracing and whole cell recording experiments to elucidate and characterise a relatively unknown pathway between the endopiriform (EN) and CA1 of the ventral hippocampus (vCA1) and to assess its functional role in social and object recognition using fibre photometry and dual vector chemogenetics. The main findings were that the EN sends robust projections to the vCA1 that collateralise to the prefrontal cortex, lateral entorhinal cortex and piriform cortex, and these EN projection neurons terminate in the stratum lacunosum-moleculare (SLM) layer of distal vCA1, synapsing onto GABAergic neurons that span across the Pyramidal-Stratum Radiatum (SR) and SR-SML borders. It was also demonstrated that EN input disynaptically inhibits vCA1 pyramidal neurons. vCA1 projecting EN neurons receive afferent input from piriform cortex, and from within EN. Finally, fibre photometry experiments revealed that vCA1 projecting EN neurons are most active when mice explore novel objects or conspecifics, and pathway-specific chemogenetic inhibition led to an impairment in the ability to discriminate between novel vs. familiar objects and conspecifics.

      The authors have addressed most of my concerns, but a few weaknesses remain :<br /> (1) I expected to see the addition of raw interaction times with objects and conspecifics for each phase of social testing (pre-test, sociability test, social discrimination), as per my comment on including raw data. However, the authors only provided total distance traveled and velocity, and total interaction time in Figure S9, which is less informative.

      We apologies for missing the request. We have added the raw interaction times in Fig. S9G.

      (2) The authors observed increased activity in vCA1-projecting EN neurons tracking with the preferred object during the pre-test (object-object exploration) phase of the social tests, and the summary schematic (Figure 9A) depicts animals as showing a preference for one object over the other (although they are identical) in both the social and object recognition tests. However, in the chemogenetic experiment, the data (Fig S9B) indicate that animals did not show this preference for one object over another, making the expected baseline for this task unclear. This also raises an important question of whether the lack of effect from chemogenetic inhibition of vCA1-projecting EN neurons could be attributed to the absence of this baseline preference.

      We appreciate the comments. In Fig. S9B, although the group median at baseline (pretest) showed no preference for one object, individual subjects displayed a preference for one object (i.e., each data point deviated positively or negatively from 0.5) in saline condition. Therefore, we do not think that a lack of baseline preference accounts for the absence of the inhibition effect in the pretest.

      Additionally, the finding that vCA1-projecting EN activity is associated with the preferred object exploration appears to counter the authors' argument that novelty engages this circuit (since both objects are novel in this instance). This discrepancy warrants further discussion.

      This is an interesting point. One possibility is that during the pretest, EN activity simply "reports" or "represents" the interaction time without driving exploratory preference. This aligns with our DREADD experiment data, which show that inhibition of EN neurons produced no overall behavioral effect. Innate exploratory behavior has been attributed to various circuits, including the medial preoptic area → PAG circuit (Ryoo et al., 2021, Front. Neuro.) and the Septal → VTA circuit (Mocellin et al., 2024, Neuron). We found no direct projection from these areas to EN (Fig. 6), but such connections could be established di- or polysynaptically. Moreover, these circuits could be driven by common inputs, such as the locus coeruleus or the cholinergic system for arousal, with only specific downstream targets, excluding EN, playing a key role in driving innate exploration and preference.

      We have inserted the following sentence in discussion (line 253-255):

      “The correlation of ENvCA1-proj. activity with novel object preference in the pretest nevertheless suggests that these neurons 'represent' the innate preference without driving it.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Line 209: Please remove the reference to neural activity 'predicting' behavior, as correlation analysis does not imply predictive power.

      We now have changed the phrase to “Although EN<sup>vCA1-proj.</sup> activity was correlated with the behavior…”

      Line 236: It is unclear what is meant by: 'This circuit motif may predict the predominant role of ENvCA1-proj. neurons in social recognition memory'

      We have changed the sentence to the following for the clarity:

      “Since social odor information is crucial for discriminating conspecifics in rodents, this circuit motif may predict the predominant role of ENvCA1-proj. neurons in social recognition memory, given that social odor can engage multiple olfactory pathways innervating the piriform cortex.”

      Fig 7 title: insert 'with' after correlates: 'Activity of ENvCA1-proj. neurons correlates social/object discrimination performance'

      Corrected.

      Fig S1 title: 'Projecing' typo.

      Corrected.

      Fig S8: Please rephrase for clarity: 'In pretest, the object was aligned by longer interaction time (preferred object is plotted in right side)'

      We now have rephrased the sentence to:

      “In the pretest plot, the object that the mice interacted with more is placed on the right side.”

      References:

      A septal-ventral tegmental area circuit drives exploratory behavior. Mocellin, Petra et al. Neuron, Volume 112, Issue 6, 1020-1032.e7

      An inhibitory medial preoptic circuit mediates innate exploration. Ryoo, Jia et al. Front. Neurosci., 23 August 221. Volume 15- 2021

    1. eLife Assessment

      This study provides important insights into the brain activity and connectivity underlying speech comprehension, revealing three brain states. The authors present compelling evidence by leveraging hidden Markov modeling of fMRI data to link brain state dynamics to comprehension scores, though the functional role of these states remains under-explored. These findings advance our understanding of how brain state transitions in narrative comprehension relate to stimulus-specific features.

    2. Reviewer #1 (Public review):

      Summary:

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.

      Strengths:

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.

    3. Reviewer #2 (Public review):

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.

      The correlation between narrative features and brain state expression was reliable, but relatively low (~0.03). As discussed in the manuscript, this could be due to measurement noise, as well as narrative features accounting for a small proportion of cognitive processes underlying the brain states.

      A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. Across tasks, however, the spatial regions associated with each state varied. For example, state 2 during narrative comprehension was similar to both states 2 and 3 during rest (Fig. 5A), suggesting that the organization of the three states was task dependent.

      The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over may offer a more parsimonious account than a multi-process account where the states correspond to distinct processes. This possibility is mentioned briefly in the introduction, but not developed further.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech. 

      Strengths: 

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications. 

      Weaknesses: 

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results. 

      For the Experimental Procedure, we will provide a more detailed description about stimuli, and the comprehension test, and upload the audio files and corresponding transcriptions as the supplementary dataset. 

      For statistics/analyses, we have reproduced the states' spatial maps using unnormalized activity pattern. For the resting state, we observed a state resembling the baseline state described in Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states were characterized by network activities varying largely from zero. In addition, we have re-generated the null distribution for behaviorbrain state correlations using circular shift. The results are largely consistent with the previous findings. We have also made some other adjustment to the analyses or add some new analyses as recommended by the reviewer. We will revise the manuscript to incorporate these changes.

      For the interpretation/rationale: We will add a more detailed interpretation for the association between state occurrence and semantic coherence. Briefly speaking, higher semantic coherence may allow for the brain to better accumulate information over time.

      State #2 seems to be involved in the integration of information at shorter timescales (hundreds of milliseconds) while State #3 seems to be involved in the longer timescales (seconds). 

      We greatly appreciate the reviewer for the insightful comments and constructive suggestions.  

      Reviewer #2 (Public review): 

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension. 

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions. 

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      In the revision, we generated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.

      We note that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations are primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will replace Figure 3 and the related supplementary figures with new ones, in which the null distribution is generated via circular shift. Furthermore, we will expand our discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion."

      In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on the ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015).

      Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes. 

      The temporal window hypothesis provides a more fitting explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) The "Participants and experimental procedure" section deserves more details. I've checked Liu et al. (2020), and the dataset contained 43 participants aged 20-75 years, whereas this study contained data from 64 young adults and 30 old adult samples. The previous dataset seems to have two stories, whereas this study seems to have three. Please be specific, given that the dataset does not seem the same. Could the authors also include more descriptions of what the auditory stories were? For example, what were the contents, and how were they recorded? 

      The citation is partially incorrect. The dataset of young adults is shared with our work published in (2022). The 64 participants listened to one of three stories told by a female college student in Mandarin, recounting her real-life experience of hiking, a graduate admission interview, and her first time taking a flight, respectively. The sample of older adults is from our work published in (2020), which includes 30 older adults and additionally 13 young adults. The stimuli in this case were two stories told by an older woman in a Chinese dialect, describing her experience in Thailand and riding a warship, respectively. Since we aim to explore whether the main results can be replicated on a different age group, we excluded the 13 young adults from the analysis. 

      All the stories were recorded during fMRI scanning using a noise-canceling microphone (FOMRI-III; Optoacoustics Ltd, Or-Yehuda, Israel) positioned above the speaker’s mouth. The audio recordings were subsequently processed offline with Adobe Audition 3.0 (Adobe Systems Inc., USA) to further eliminate MRI scanner noise.

      In the revised manuscript, we have updated the citation, and provided a more detailed description of the stimuli in the supplementary material. We have also uploaded the audio files along with their corresponding transcriptions to GitHub.

      (2) I am curious about individual differences in comprehension scores. Did participants have less comprehension of the audio-narrated story because the story was a hard-tocomprehend narrative or because the audio quality was low? Could the authors share examples of comprehension tests? 

      We believe two factors contribute to the individual differences in comprehension scores. First, the audio quality is indeed moderately lower than in dailylife story-listening conditions. This is because those stories were recorded and played during fMRI scanning. Although a noise-canceling equipment was used, there were still some noises accompanying the speech, which may have made speech perception and comprehension more difficult than usual.

      Second, the comprehension test measured how much information about the story (including both main themes and details) participants could recall. Specifically, participants were asked to retell the stories in detail immediately after the scanning session. Following this free recall, the experimenters posed a few additional questions drawn from a pre-prepared list, targeting information not mentioned in their recall. If participants experienced lapses of attention or did not store the incoming information into memory promptly, they might fail to recall the relevant content. In several studies, such a task has been called a narrative recall test. However, memory plays a crucial role in real-time speech comprehension, while comprehension affects the depth of processing during memory encoding, thereby influencing subsequent recall performance. To align with prior work (e.g., Stephens et al., 2010) and our previous publications, we chose to referred to this task as narrative comprehension. 

      In the revised manuscript, we have provided a detailed description about the comprehension test (Line 907-933) and share the examples on GitHub. 

      (3) Regarding Figure 3, what does it mean for a state occurrence to follow semantic coherence? Is there a theoretical reason why semantic coherence was measured and related to brain state dynamics? A related empirical question is: is it more likely for the brain states to transition from one state to another when nearby time points share low semantic similarity compared to chance? 

      We analyzed semantic coherence and sound envelope as they capture different layers of linguistic and acoustic structure that unfold over varying temporal scales. Changes in the sound envelope typically occur on the order of milliseconds to a few hundred milliseconds, changes in word-level semantic coherence span approximately 0.24 ± 0.15 seconds, and changes in clause-level semantic coherence extend to 3.2 ± 1.7 seconds. Previous theory and empirical studies suggest that the timescales of information accumulation vary hierarchically, progressing from early sensory areas to higher-order areas (Hasson et al., 2015; Lerner et al., 2011). Based on this work, we anticipate that the three brain states, which are respectively associated with the auditory and sensory motor network, the language network and the DMN, would be selectively modulated by these speech properties corresponding to distinct timescales. 

      Accordingly, when a state occurrence aligns with (clause-level) semantic coherence, it suggests that this state is engaged in processing information accumulated at the clause level (i.e., its semantic relationship). Higher coherence facilitates better accumulation, making it more likely for the associated brain state to be activated. 

      We analyzed the relationship between state transition probability and semantic coherence, but did not find significant results. Here, the transition probability was calculated as Gamma(t) – Gamma(t-1), where Gamma refers to the state occurrence probability. The lack of significant findings may be because brain state transitions are driven primarily by more slowly changing factors. Indeed, we found the average dwell time of the three states ranges from 9.66 to 15.29s, which is a much slower temporal dynamics compared to the relatively rapid shifts in acoustic/semantic properties. 

      In the revised version, we have updated the Introduction to clarify the rational for selecting the three speech properties and to explore their relationship with brain dynamics (Line 111-118)

      (4) When running the HMM, the authors iterated K of 2 to 10 and K = 4, 10, and 12. However, the input features of the model consist of only 9 functional networks. Given that the HMM is designed to find low-dimensional latent state sequences, the choice of the number of latent states being higher than the number of input features sounds odd to me - to my speculation, it is bound to generate almost the exact same states as 9 networks and/or duplicates of the same state. I suggest limiting the K iterations from 2 to 8. For replication with Yeo et al.'s 7 networks, K iteration should also be limited to K of less than 7, or optionally, Yeo's 7 network scheme could be replaced with a 17network scheme. 

      We understand your concern. However, the determination of the number (K) of hidden states is not directly related to the number of features (in this case, the number of networks), but rather depends on the complexity of the time series and the number of underlying patterns. Given that each state corresponds to a distinct combination of the features, even a small number of features can be used to model a system with complex temporal behaviors and multiple states. For instance, for a system with n features, assuming each is a binary variable (0 or 1), there are maximally 2<sup>n</sup> possible underlying states. 

      In our study, we recorded brain activity over 300 time points and used the 9 networks as features. At different time points, the brain can exhibit distinct spatial configurations, reflected in the relative activity levels of the nine networks and their interactions. To accurately capture the temporal dynamics of brain activity, it is essential to explore models that allow for more states than the number of features. We note that in other HMM studies, researchers have also explored states more than the number of networks to find the best number of hidden states (e.g., Ahrends et al., 2022; Stevner et al., 2019). 

      Furthermore, Ahrends et al. (2022) suggested that “Based on the HCP-dataset, we estimate as a rule of thumb that the ratio of observations to free parameters per state should not be inferior to 200”, where free parameters per state is [𝐾 ∗(𝐾 −1)+ (𝐾 −1)+𝐾 ∗𝑁 ∗(𝑁 +1)/2]/𝐾. According to this, there should be above 10, 980 observations when the number of states (K) is 10 (the maximal number in our study) and the number of networks (N) is 9. In our group-level HMM model, there were 64 (valid runs) * 300 (TR) = 19200 observations for young adults, and 50 (valid runs) * 210 (TR) = 10500 observations for older adults. Aside from the older adults' data being slightly insufficient (4.37% less than the suggestion), all other hyperparameter combinations in this study meet the recommended number of observations. 

      (5) In Figure 2, the authors write that the states' spatial maps were normalized for visualization purposes. Could the authors also show visualization of brain states that are not normalized? The reason why I ask is, for example, in Song, Shim, & Rosenberg (2023), the base state was observed which had activity levels all close to the mean (which is 0 because the BOLD activity was normalized). If the activity patterns of this brain state were to be normalized after state estimation, the base state would have looked drastically different than what is reported. 

      We derived the spatial maps of the states using unnormalized activity patterns, with the BOLD signals Z-score normalized to a mean of zero. Under the speech comprehension task, the three states exhibited relatively large fluctuations in network activity levels. The activity ranges were as follows: [-0.71 to 0.51] for State #1, [-0.26 to 0.30] for State #2, and [-0.82 to 0.40] for State #3. For the resting state, we observed a state resembling the baseline state as described in Song, Shim, & Rosenberg (2023), with activity values ranging from -0.133 to 0.09. 

      In the revision, we have replaced the states' spatial maps with versions showing unnormalized activity patterns. 

      (6) In line 297, the authors speculate that "This may be because there is too much heterogeneity among the older adults". To support this speculation, the authors can calculate the overall ISC of brain state dynamics among older adults and compare it to the ISC estimated from younger adults.  

      We analyzed the overall ISC of brain state dynamics, and found the ISC was indeed significantly lower among the older adults than that among the younger adults. We have revised this statement as follows:

      These factors can diminish the inter-subject correlation of brain state dynamics— indeed, ISCs among older adults were significantly lower than those among younger adults (Figure S5)—and reduce ISC's sensitivity to individual differences in task performance (Line 321-326).

      Other comments: 

      (7) In Figure 4, the authors showed a significant positive correlation between head movement ISC with the best performer and comprehension scores. Does the average head movement of all individuals negatively correlate with comprehension scores, given that the authors argue that "greater task engagement is accompanied by decreased movement"? 

      We examined the relationship between participants' average head movement across the comprehension task and their comprehension scores. There was no significant correlation (r = 0.041, p = 0.74). In the literature (e.g. ,Ballenghein et al., 2019) , the relationship between task engagement and head movement was also assessed at the moment-by-moment level, rather than by using time-averaged data.

      Real-time head movements reflect fluctuations in task engagement and cognitive state. In contrast, mean head movement, as a static measure, fails to capture these changes, and thus is not effective in predicting task performance.

      (8) The authors write the older adults sample, the "independent dataset". Technically, however, this dataset cannot be independent because they were collected at the same time by the same research group. I would advise replacing the word independent to something like second dataset or replication dataset. 

      We have replaced the phrase “independent dataset” with “replication dataset”. 

      (9) Pertaining to a paragraph starting in line 586: For non-parametric permutation tests, the authors note that the time courses of brain state expression were "randomly shuffled". How was this random shuffling done: was this circular-shifted randomly, or were the values within the time course literally shuffled? The latter approach, literal shuffling of the values, does not make a fair null distribution because it does not retain temporal regularities (autocorrelation) that are intrinsic to the fMRI signals. Thus, I suggest replacing all non-parametric permutation tests with random circular shifting of the time series (np. roll in python).  

      In the original manuscript, the time course was literally shuffled. In the revised version, we circular-shifted the time course randomly (circshift.m in Matlab) to generate the null distribution. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence (Line 230-235). 

      (10) The p value calculation should be p = (1+#(chance>=observed))/(1+#iterations) for one-tailed test and p = (1+#(abs(chance)>=abs(observed)))/(1+#iterations) for twotailed test. Thus, if 5,000 iterations were run and none of the chances were higher than the actual observation, the p-value is p = 1/5001, which is the minimal value it can achieve. 

      Have corrected. 

      (11) State 3 in Figure S2 does not resemble State 3 of the main result. Could the authors explain why they corresponded State 3 of the Yeo-7 scheme to State 3 of the nineparcellation scheme, perhaps using evidence of spatial overlap? 

      The correspondence of states between the two schemes was established using evidence of state expression time course. 

      To assess temporal overlap, we calculated Pearson’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network parcellation scheme in terms of state expression probabilities. The time courses of the 64 participants were concatenated, resulting in 19200 (300*64) time points for each state. The one that the candidate state most closely resembled was set to be its corresponding state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. As demonstrated in the confusion matrix, each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two schemes.

      We also assessed the spatial overlap between the two schemes. First, a state activity value was assigned to each voxel across the whole brain (including a total of 34,892 voxels covered by both parcellation schemes). This is done for each brain state. Next, we calculated Spearman’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network scheme in terms of whole-brain activities. The pattern of spatial overlap is consistent with the pattern of temporal overlap, such that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively.

      Author response image 1.

      We noted that the networks between the two schemes are not well aligned in their spatial location, especially for the DMN (as shown below). This may lead to the low spatial overlap of State #3, which is dominated by DMN activity. Consequently, establishing state correspondence based on temporal information is more appropriate in this context. We therefore only reported the results of temporal overlap in the manuscript. 

      We have added a paragraph in the main text for “Establishing state correspondence between analyses” (Line 672-699). We have also updated the associated figures (Fig.S2, Fig.S3 and Fig.5)

      Author response image 2.

      (12) Line 839: gamma parameter, on a step size of? 

      (16) Figure 3. Please add a legend in the "Sound envelope" graph what green and blue lines indicate. The authors write Coh(t) and Coh(t, t+1) at the top and Coh(t) and Coh(t+1) at the bottom. Please be consistent with the labeling. Shouldn't they be Coh(t-1, t) and Coh(t, t+1) to be exact for both? 

      Have corrected. 

      (17) In line 226, is this one-sample t-test compared to zero? If so, please write it inside the parentheses. In line 227, the authors write "slightly weaker"; however, since this is not statistically warranted, I suggest removing the word "slightly weaker" and just noting significance in both States 1 and 2.  

      Have corrected.

      (18) In line 288, please fix "we also whether". 

      Have corrected. 

      (19) In Figure 2C, what do pink lines in the transition matrix indicate? Are they colored just to show authors' interests, or do they indicate statistical significance? Please write it in the figure legend.   

      Yes, the pink lines indicate a meaningful trend, showing that the between-state transition probabilities are significantly higher than those in permutation.

      We have added this information to the figure legend. 

      Reviewer #2 (Recommendations for the authors):

      (1) It is unclear how the correspondence between states across different conditions and datasets was computed. Given the spatial autocorrelation of brain maps, I recommend reporting the Dice coefficient along with a spin-test permutation to test for statistical significance.  

      The state correspondence between different conditions and between the two datasets are established using evidence of spatial overlap. The spatial overlap between states was quantified by Pearson’s correlation using the activity values (derived from HMM) of the nine networks. For each candidate state identified in other analyses (for the Rest, MG and older-adult datasets), we calculate the correlation between its network activity pattern and the three predefined states from the main analysis (for the young-adults dataset), and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      For the comparison between the young and older adults’ datasets (as shown below), the largest spatial overlap occurred along the diagonal of the confusion matrix, with high correlation values. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two datasets. As the HMM is modelled at the level of networks which lack accurate coordinates, we did not apply the spin-test to assess the statistical significance of overlap. Instead, we extracted the state activity patterns from the 1000 permutations (wherein the original BOLD time courses were circularly shifted and an HMM was conducted) for the older-adults dataset. Applying the similar state-correspondence strategy, we generated a null distribution of spatial overlap. The real overlap of the three states was greater than and 97.97%, 95.34% and 92.39% instances from the permutation (as shown below). 

      Author response image 3.

      For the comparison of main task with the resting and the incomprehensible speech condition, there was some degree of confusion: there were two candidate states showing the highest similarity to State #2. In this case, we labeled the most similar candidate as State #2. The other candidate was then assigned to the predefined state with which it had the second-highest correlation. We used a prime symbol (e.g., State #3') to denote cases where such confusion occurred. These findings support our conclusion that the tripartite-organization of brain states is not a task-free, intrinsic property.

      When establishing the correspondence between the Yeo-7 network and the ninenetwork parcellation schemes, we primarily relied on evidence from temporal overlap measures, as a clear network-level alignment between the two parcellation schemes is lacking. Temporal overlap was quantified by calculating the correlation of state occurrence probabilities between the two schemes. To achieve this, we concatenated the time courses of 64 participants, resulting in a time series consisting of 19,200 time points (300 time points per participant) for each state. Each of the three candidate states from the Yeo-7 network scheme was best matched to State #1, State #2, and State #3 from the main analyses, respectively. To determine the statistical significance of the temporal overlap, we circular shifted each participant’s time course of state expression obtained from the Yeo-7network scheme for 1000 times. Applying the same strategy to find the matching states, we generated a null distribution of overlap. The real overlap was much higher than the instances from permutation. 

      Author response image 4.

      In the revision, we have provided detailed description for how the state correspondence is established and reported the statistical significance of those correspondence (Line 671-699). The associated figures have also been updated (Fig.5, Fig. S2 and Fig.S3).  

      (2) Please clarify if circle-shifting was applied to the state expression time course when generating the null distribution for behavior-brain state correlations reported in Figure (3). This seems important to control for the temporal autocorrelation in the time courses.  

      We have updated the results by using circle-shifting to generated the null distribution. The results are largely consistent with the previous on without circular shifting (Line 230-242). 

      (3) Figure 3: What does the green shaded area around the sound envelope represent? In the caption, specify whether the red line in the null distributions indicates the mean or median R between brain state expression and narrative features. It would also be beneficial to report this value in the main text. 

      The green shaded area indicated the original amplitude of speech signal, while blue line indicates the smoothed, low-frequency contour of amplitude changes over time (i.e., speech envelope). We have updated the figure and explained this in the figure caption. 

      The red line in the null distributions indicates the R between brain state expression and narrative features for the real data. and reported the mean R of the permutation in the main text. 

      (4) The manuscript is missing a data availability statement (https://elifesciences.org/inside-elife/51839f0a/for-authors-updates-to-elife-s-datasharing-policies). 

      We have added a statement of data availability in the revision, as follows: 

      “The raw and processed fMRI data are available on OpenNeuro: https://openneuro.org/datasets/ds005623. The experimental stimuli, behavioral data and main scripts used in the analyses are provided on Github. ”

      (5) There is a typo in line 102 ("perceptual alalyses"). 

      Have corrected. 

      We sincerely thank the two reviewers for their constructive feedback, thorough review, and the time they dedicated to improving our work.

      Reference: 

      Ahrends, C., Stevner, A., Pervaiz, U., Kringelbach, M. L., Vuust, P., Woolrich, M. W., & Vidaurre, D. (2022). Data and model considerations for estimating timevarying functional connectivity in fMRI. Neuroimage, 252, 119026. 

      Ballenghein, U., Megalakaki, O., & Baccino, T. (2019). Cognitive engagement in emotional text reading: concurrent recordings of eye movements and head motion. Cognition and Emotion. 

      Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the national academy of sciences, 119(6), e2108091119. https://doi.org/10.1073/pnas.2108091119  

      Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences, 19(6), 304-313. 

      Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story [Article]. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011  

      Liu, L., Li, H., Ren, Z., Zhou, Q., Zhang, Y., Lu, C., Qiu, J., Chen, H., & Ding, G. (2022). The “two-brain” approach reveals the active role of task-deactivated default mode network in speech comprehension. Cerebral Cortex, 32(21), 4869-4884. 

      Liu, L., Zhang, Y., Zhou, Q., Garrett, D. D., Lu, C., Chen, A., Qiu, J., & Ding, G. (2020). Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cerebral Cortex, 30(3), 942-951. https://doi.org/10.1093/cercor/bhz138

    1. eLife Assessment

      This valuable study reports on electrophysiological recording of the spiking activity of single neurons in the entopeduncular nucleus (EPN) in freely-moving mice performing an auditory discrimination task. The data show that the activity of single EPN neurons is modulated by reward and movement kinematics, with the latter further affected by task contexts (e.g. movement toward or away from a reward location). The results provide solid evidence for the conclusions. There is some ambiguity as to whether the data contain the population of EPN neurons characterized in previous studies that obtained different results. Investigations separating confounding factors would be of benefit. Nonetheless, the work is overall of interest to those who study how the basal ganglia, particularly the EPN, contribute to behavior.

    2. Reviewer #1 (Public review):

      The authors in this paper investigate the nature of the activity in the rodent EPN during a simple freely moving cue-reward association task. Given that primate literature suggest movement coding whereas other primate and rodent studies suggest mainly reward outcome coding in the EPNs, it is important try to tease apart the two views. Through careful analysis of behavior kinematics, position, and the neural activity in the EPNs, the authors reveal an interesting and complex relationship between the EPN and mouse behavior.

      Strengths:

      (1) The authors use a novel freely moving task to study EPN activity, which displays rich movement trajectories and kinematics. Given that previous studies have mostly looked at reward coding during head fixed behavior, this study adds a valuable dataset to the literature.

      (2) The neural analysis is rich and thorough. Both single neuron level and population level (i.e. PCA) analysis are employed to reveal what EPN encodes.

      Discussion:<br /> EPN is one of the major output nuclei of the basal ganglia. What information is present within EPN is still unclear, and under investigation. The authors have used electrophysiology to determine the nature of information present within EPN that is likely to be valuable to the field. Future studies should try to address whether this information is specific to certain cell types within EPN or whether there is topography within EPN that reflects the kinematic information present within EPN. This will require more careful dissection of EPN activity based on anatomy. Future experiments should also consider tasks that isolate a single limb (i.e. joystick tasks) in order to better understand the kinematic encoding of forelimb movement. This, combined with recording in forelimb encoding region of EPN, should give us insights into the nature of kinematic control of EPN. Overall, this study will be useful to inspire future investigations in the function of EPN.

    3. Reviewer #2 (Public review):

      This paper examined how the activity of neurons in the entopeduncular nucleus (EPN) of mice relates to kinematics, value, and reward. The authors recorded neural activity during an auditory cued two-alternative choice task, allowing them to examine how neuronal firing relates to specific movements like licking or paw movements, as well as how contextual factors like task stage or proximity to a goal influence the coding of kinematic and spatiotemporal features. The data shows that the firing of individual neurons is linked to kinematic features such as lick or step cycles. However, the majority of neurons exhibited activity related to both movement types, suggesting that EPN neuronal activity does not merely reflect muscle-level representations. This contradicts what would be expected from traditional action selection or action specification models of the basal ganglia.

      The authors also show that spatiotemporal variables account for more variability compared to kinematic features alone. Using demixed Principal Component Analysis, they reveal that at the population level, the three principal components explaining the most variance were related to specific temporal or spatial features of the task, such as ramping activity as mice approached reward ports, rather than trial outcome or specific actions. Notably, this activity was present in neurons whose firing was also modulated by kinematic features, demonstrating that individual EPN neurons integrate multiple features. A weakness is that what the spatiotemporal activity reflects is not well specified. The authors suggest some may relate to action value due to greater modulation when approaching a reward port, but acknowledge action value is not well parametrized or separated from variables like reward expectation.

      A key goal was to determine whether activity related to expected value and reward delivery arose from a distinct population of EPN neurons or was also present in neurons modulated by kinematic and spatiotemporal features. In contrast to previous studies (Hong & Hikosaka 2008 and Stephenson-Jones et al., 2016), the current data reveals that individual neurons can exhibit modulation by both reward and kinematic parameters. Two potential differences may explain this discrepancy: First, the previous studies used head-fixed recordings, where it may have been easier to isolate movement versus reward-related responses. Second, those studies observed prominent phasic responses to the delivery or omission of expected rewards - responses that are present but not common in the current paper. This suggests a possibility that the VGlut2+ EPN neurons that project to the LHb were under/not sampled, antidromic or optogenetic tagging would have been needed to confirm the identity of the populations that were recorded. Alternatively, in the head-fixed recordings, kinematic/spatial coding may have gone undetected due to the forced immobility.

      Overall, this paper offers needed insight into how the basal ganglia output encodes behavior. The EPN recordings from freely moving mice clearly demonstrate that individual neurons integrate reward, kinematic, and spatiotemporal features, challenging traditional models. However, the specific relationship between the spatiotemporal activity and factors like action value remains unclear.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors in this paper investigate the nature of the activity in the rodent EPN during a simple freely moving cue-reward association task. Given that primate literature suggests movement coding whereas other primate and rodent studies suggest mainly reward outcome coding in the EPNs, it is important to try to tease apart the two views. Through careful analysis of behavior kinematics, position, and neural activity in the EPNs, the authors reveal an interesting and complex relationship between the EPN and mouse behavior.

      Strengths:

      (1) The authors use a novel freely moving task to study EPN activity, which displays rich movement trajectories and kinematics. Given that previous studies have mostly looked at reward coding during head-fixed behavior, this study adds a valuable dataset to the literature. (2) The neural analysis is rich and thorough. Both single neuron level and population level (i.e. PCA) analysis are employed to reveal what EPN encodes.

      Thank you very much for this appreciation.

      Weaknesses:

      (1) One major weakness in this paper is the way the authors define the EPN neurons. Without a clear method of delineating EPN vs other surrounding regions, it is not convincing enough to call these neurons EPNs solely from looking at the electrode cannula track from Figure 2B. Indeed, EPN is a very small nucleus and previous studies like Stephenson-Jones et al (2016) have used opto-tagging of Vglut2 neurons to precisely label EPN single neurons. Wallace et al (2017) have also shown the existence of SOM and PV-positive neurons in the EPN. By not using transgenic lines and cell-type specific approaches to label these EPN neurons, the authors miss the opportunity to claim that the neurons recorded in this study do indeed come from EPN. The authors should at least consider showing an analysis of neurons slightly above or below EPN and show that these neurons display different waveforms or firing patterns.

      We thank the reviewer for their comment, and we thank the opportunity to expand on the inclusion criteria of studied units after providing an explanation. 

      As part of another study, we performed experiments recording in EPN with optrodes and photoidentification in PV-Cre animals. We found optoidentified units in both: animals with correct placement (within the EPN) and on those with off-target placement (within the thalamus or medial to the EPN). Thus, despite the use of Cre animals, we relied on histology to ensure correct EPN recording. We believe that the optotagging based purely on neural makers such as PV, SOM, VGLUT, VGAT would not provide a better anatomical delineation of the EPN since adjacent structures are rich in those same markers. The thalamic reticular nucleus is just dorsal to the EPN and it has been shown to express both SOM and PV (Martinez-Garcia et al., 2020). 

      On the other hand, the lateral hypothalamus (just medial to the EPN) also expresses vGlut2 and SOM. Stephenson-Jones (2016), Extended Data Figure 1, panel g, shows vGluT2 and somatostatin labeling of neurons, with important expression of neurons dorsal, ventral and medial to the EPN. Thus, we believe that viral strategies relying on single neuronal markers still depend on careful histological analysis of recording sites.

      A combination of neural markers or more complex viral strategies might be more suitable to delineate the EPN. As an example, for anatomical tracing Stephenson-Jones et al. 2016 performed a rabies-virus based approach involving retrogradely transported virus making use of projection sites through two injections. Two step viral approaches were also performed in Wallace, M. et al. 2017. We attempted to perform a two-step viral approach, using an anterogradely transported Cre-expressing virus (AAV1.hSyn.Cre.WPRE.hGH) injected into the striatum and a second Cre dependent ChR2 into the EPN. However, our preliminary experiments showed that this double viral approach had a stark effect decreasing the performance of animals during the task (we attempted re-training 2-3 weeks after viral infections and animals failed to turn to the contralateral side of the injections). We believe that this approach might have had a toxic effect (Zingg et al., 2017). 

      To this point, a recent paper (Lazaridis et al., 2019) repeated an optogenetic experiment performed in the Stephenson-Jones et al. study, using a set of different viral approaches and concluded that increasing the activity of GPi-LHb is not aversive, as it had been previously reported. Thus, future studies attempting to increase anatomical specificity are a must, but they will require using viral approaches amenable to the behavioral paradigm.

      We attempted to find properties regarding waveforms, firing rate, and firing patterns from units above or below, however, we did not find a marker that could generate a clear demarcation. We show here a figure that includes the included units in this study as well as excluded ones to show that there is a clear overlap.

      Author response image 1.

      Finally, we completely agree with the reviewer in that there is still room for improvement. We have further expanded the Methods section to explain better our efforts to include units recorded within the EPN. Further, we have added a paragraph within the Discussion section to point out this limitation (lines 871-876).

      Methods (lines 116-131):

      “Recordings. Movable microwire bundles (16 microwires, 32 micrometers in diameter, held inside a cannula, Innovative Neurophysiology, Durham, NC)] were stereotaxtically implanted just above the entopeduncular nucleus (-0.8 AP, 1.7 ML, 3.9 DV). Post surgical care included antibiotic, analgesic and antiinflammatory pharmacological treatment. After 5 days of recovery, animals were retrained for 1-2 weeks. Unitary activity was recorded for 2-6 days at each dorsoventral electrode position and the session with the best electrophysiological (signal to noise ratio (>2), stability across time) and behavioral [performance, number of trials (>220)] quality was selected. Microwire electrodes were advanced in 50 micrometer dorsoventral steps for 500 micrometers in total. After experiment completion, animals were perfused with a 4% paraformaldehyde solution. Brains were extracted, dehydrated with a 30% sucrose solution and sectioned in a cryostat into 30micron thick slices. Slices were mounted and photographed using a light microscope. Microwire tracks of the 16-microwire bundle were analyzed (Fig. 2A-B) and only animals with tracks traversing the EPN were selected (6 out of 10). Finally, we located the final position of microwire tips and inferred the dorsoventral recording position of each of the recording sessions. Only units recorded within the EPN were included.” 

      Discussion (lines 871-876):

      “A weakness of the current study is the lack of characterization of neuronal subtypes. An area of opportunity for future research could be to perform photo-identification of neuronal subtypes within the EPN which could contribute to the overall description of the information representation. Further, detailed anatomical viral vector strategies could aid to improve anatomical localization of recordings, reduce reliance on histological examination, and solve some current controversies (Lazaridis et al., 2019).” 

      (2) The authors fail to replicate the main finding about EPN neurons which is that they encode outcome in a negative manner. Both Stephenson-Jones et al (2016) and Hong and Hikosaka (2008) show a reward response during the outcome period where firing goes down during reward and up during neutral or aversive outcome. However, Figure 2 G top panel shows that the mean population is higher during correct trials and lower during incorrect trials. This could be interesting given that the authors might try recording from another part of EPN that has not been studied before. However, without convincing evidence that the neurons recorded are from EPN in the first place (point 1), it is hard to interpret these results and reconcile them with previous studies.

      We really thank the reviewer for pointing out that we need to better explain how EPN units encode outcome. We now provide an additional panel in Figure 4, its corresponding text in the results section (lines 544-562) and a new paragraph in the discussion related to this comment.

      We believe that we do indeed recapitulate findings of both of Stephenson-Jones et al (2016) and Hong and Hikosaka (2008). Both studies focus on a specific subpopulation of GPi/EPN neurons that project to the lateral habenula (LHb). Stephenson-Jones et al (2016) posit that GPi-LHb neurons (which they opto-tag as vGluT2) exhibit a decreased firing rate during rewarding outcomes. Hong and Hikosaka (2008) antidromically identified LHb projecting neurons through within the GPi and found reward positive and reward negative neurons, which were respectively modulated either by increasing or decreasing their firing rate with a rewarding outcome (red and green dots on the x-axis of Figure 5A in their paper).

      As the reviewer pointed out the zScore may be misleading. Therefore, in our study we also decomposed population activity on reward axis through dPCA. When marginalizing for reward in Figure 3F, we find that the weights of individual units on this axis are centered around zero, with positive and negative values (Figure 3F, right panel). Thus, units can code a rewarding outcome as either an increase or a decrease of activity. We show example units of such modulation in Figure 3-1g and h.

      We had segregated our analysis of spatio-temporal and kinematic coding upon the reward coding of units in Figure 4L-M. Yet, following this comment and in an effort of further clarifying this segregation, we introduced panels with the mean zScore of units during outcome evaluation in Figure 4L.

      We amended the main text to better explain these findings (lines 544-562).

      “Previous reports suggest that EPN units that project to the lateral habenula encode reward as a decrease in firing rate. Thus, we wished to ask whether reward encoding units can code kinematic and spatio-temporal variables as well.

      To this end, we first segregated units upon their reward coding properties: reward positive (which increased activity with reward) and reward negative units (which decreased activity with reward). We performed auROC on the 250ms after head entry comparing rewarded trials and incorrect trails (p<0.001, permutation test). Mean activity of reward insensitive, positive and negative units is shown in Fig. 4L. Next, we performed a dimensionality reduction on the coefficients of the model that best explained both contexts (kinematic + spatio-temporal model on pooled data) using UMAP (McInnes et al., 2018). We observe a continuum rather than discrete clusters (Fig. 4L). Note that individual units are color coded according to their responsivity to reward. We did not find a clear clustering either.”  

      Paragraph added in the discussion (lines 749-755):

      “In this study, we found that rewarding outcomes can be represented by EPN units through either an increase or a decrease in firing rate (Fig. 3F, 3-1g-h, 4L). While Stephenson-Jones et al., 2016 found that lateral habenula (LHb)-projecting neurons within the EPN of mice primarily encoded rewarding outcomes by a decrease in firing rate, Hong and Hikosaka, 2008 observed that in primates, LHb-projecting units could encode reward through either a decrease or an increase in firing rate. Thus, our results align more closely with the latter study, which also employed an operant conditioning task.”

      (3) The authors say that: 'reward and kinematic doing are not mutually exclusive, challenging the notion of distinct pathways and movement processing'. However, it is not clear whether the data presented in this work supports this statement. First, the authors have not attempted to record from the entire EPN. Thus it is possible that the coding might be more segregated in other parts of EPN. Second, EPNs have previously been shown to display positive firing for negative outcomes and vice versa, something which the authors do not find here. It is possible that those neurons might not encode kinematic and movement variables. Thus, the authors should point out in the main text the possibility that the EPN activity recorded might be missing some parts of the whole EPN.

      We thank the reviewer for the opportunity to expand on this topic. We believe it is certainly possible that other not-recorded regions of the EPN might exhibit greater segregation of reward and kinematics. However, we considered it worthwhile pointing out that from the dataset collected in this study reward-sensitive units encode kinematics in a similar fashion to reward-insensitive ones (Fig. 4L,M). Moreover, we asked specifically whether reward-negative units (that decrease firing rate with rewarding outcomes, as previously reported) could encode kinematics and spatio-temporal variables with different strength than reward-insensitive ones and could not find significant differences (Fig. 4M).

      We did indeed find units that displayed decreased firing rate upon rewarding outcomes, as has been previously reported. We have addressed this fact more thoroughly in point (2). 

      Finally, we agree with the reviewer that the dataset collected in this study is by no means exhaustive of the entire EPN and have thus included a sentence pointing this out in the Discussion section (lines 805-806):

      “Given that we did not record from the entire EPN, it is still possible that another region of the nucleus might exhibit more segregation.”

      (4) The authors use an IR beam system to record licks and make a strong claim about the nature of lick encoding in the EPN. However, the authors should note that IR beam system is not the most accurate way of detecting licks given that any object blocking the path (paw or jaw-dropping) will be detected as lick events. Capacitance based, closed-loop detection, or video capturing is better suited to detect individual licks. Given that the authors are interested in kinematics of licking, this is important. The authors should either point this out in the main text or verify in the system if the IR beam is correctly detecting licks using a combination of those methods.

      We thank the reviewer for the opportunity of clarifying the lick event acquisition. We have experience using electrical alternatives to lickometers; however, we believe they were not best suited to this application. Closed-loop lickometers generally use a metallic grid upon which animals stand so that the loop can be closed; however, we wanted to have a transparent floor. We have found capacitance based lickometers to be useful in head-fixed conditions but have noticed that they are very dependent on animal position and proximity of other bodyparts such as limbs. Given the freely moving aspect of the task this was difficult to control. Finally, both electric alternatives for lickometers are more prone to noise and may introduce electrical artifacts that might contaminate the spiking signal. This is why we opted to use a slit in combination with an IR beam that would only fit the tongue and that forced enough protrusion such that individual licks could be monitored. Further, the slit could not fit other body-parts like the paw or jaw. We have now included a video (Supp. Video 2) showing a closeup of this behavior that better conveys how the jaw and paw do not fit inside the slit. The following text has been added in the corresponding methods section (lines 97-98):

      “The lickometer slit was just wide enough to fit the tongue and deep enough to evoke a clear tongue protrusion.”

      Reviewer #1 (Recommendations For The Authors):

      (1)The authors should verify using opto-tagging of either Vglut2, SOM, or PV neurons whether they can see the same firing pattern. If not, the authors should address this weakness in the paper.

      We thank the reviewer for this important point, we have provided a more detailed reply above.

      (2)The way dPCA or PCA is applied to the data is not stated at all in the main text. Are all units from different mice combined? Or applied separately for each mouse? How does that affect the interpretation of the data? At least a brief text should be included in the main text to guide the readers.

      We thank the reviewer for pointing out this important omission. We have included an explanation in the Methods section and in the Main text.

      Methods (lines 182-184):

      “For all population level analyses individual units recorded from all sessions and all animals were pooled to construct pseudo-simultaneous population response of combined data mostly recorded separately.”

      Main text (lines 397-399):

      “For population level analyses throughout the study, we pooled recorded units from all animals to construct a pseudo-simultaneous population.”

      Discussion (lines 729-730):

      “…(from pooled units from all animals to construct a pseudo-simultaneous population, which assumes homogeneity across subjects)”

      (3) The authors argue that they do not find 'value coding' in this study. However, the authors never manipulate reward size or probability, but only the uncertainty or difficulty of the task. This might be better termed 'difficulty', and it is difficult to say whether this correlates with value in this task. For instance, mice might be very confident about the choice, even for an intermediate frequency sweep, if the mouse had waited long enough to hear the full sweep. In that case, the difficulty would not correlate with value, given that the mouse will think the value of the port it is going to is high. Thus, authors should avoid using the term value.

      We agree with the reviewer. We have modified the text to specify that difficulty was the variable being studied and added the following sentence in the Discussion (lines 747-748):

      “It is still possible that by modifying reward contingencies such as droplet size value coding could be evidenced.”

      (4) How have the authors obtained Figure 7D bottom panel? It is unclear at all what this correlation represents. Are the authors looking at a correlation between instantaneous firing rate and lick rate during a lick bout?

      We thank the reviewer for pointing out that omission. It is indeed correlation coefficient between the instantaneous firing rate and the instantaneous lick rate for a lick bout. We have included labeling in Figure 7D and pointed this out in the main text [lines 680-681]:

      “Fig.7D, lower panel shows the correlation coefficient between the instantaneous firing rate and the instantaneous lick rate within a lick bout for all units.”

      Reviewer #2 (Public Review):

      This paper examined how the activity of neurons in the entopeduncular nucleus (EPN) of mice relates to kinematics, value, and reward. The authors recorded neural activity during an auditory-cued two-alternative choice task, allowing them to examine how neuronal firing relates to specific movements like licking or paw movements, as well as how contextual factors like task stage or proximity to a goal influence the coding of kinematic and spatiotemporal features. The data shows that the firing of individual neurons is linked to kinematic features such as lick or step cycles. However, the majority of neurons exhibited activity related to both movement types, suggesting that EPN neuronal activity does not merely reflect muscle-level representations. This contradicts what would be expected from traditional action selection or action specification models of the basal ganglia.

      The authors also show that spatiotemporal variables account for more variability compared to kinematic features alone. Using demixed Principal Component Analysis, they reveal that at the population level, the three principal components explaining the most variance were related to specific temporal or spatial features of the task, such as ramping activity as mice approached reward ports, rather than trial outcome or specific actions. Notably, this activity was present in neurons whose firing was also modulated by kinematic features, demonstrating that individual EPN neurons integrate multiple features. A weakness is that what the spatiotemporal activity reflects is not well specified. The authors suggest some may relate to action value due to greater modulation when approaching a reward port, but acknowledge action value is not well parametrized or separated from variables like reward expectation.

      We thank the reviewer for the comment. We indeed believe that further exploring these spatiotemporal signals is important and will be the subject of future studies.

      A key goal was to determine whether activity related to expected value and reward delivery arose from a distinct population of EPN neurons or was also present in neurons modulated by kinematic and spatiotemporal features. In contrast to previous studies (Hong & Hikosaka 2008 and Stephenson-Jones et al., 2016), the current data reveals that individual neurons can exhibit modulation by both reward and kinematic parameters. Two potential differences may explain this discrepancy: First, the previous studies used head-fixed recordings, where it may have been easier to isolate movement versus reward-related responses. Second, those studies observed prominent phasic responses to the delivery or omission of expected rewards - responses largely absent in the current paper. This absence suggests a possibility that neurons exhibiting such phasic "reward" responses were not sampled, which is plausible since in both primates and rodents, these neurons tend to be located in restricted topographic regions. Alternatively, in the head-fixed recordings, kinematic/spatial coding may have gone undetected due to the forced immobility.

      Thank you for raising this point. Nevertheless, there is some phasic activity associated with reward responses, which can be seen in the new panel in Figure 4L.

      Overall, this paper offers needed insight into how the basal ganglia output encodes behavior. The EPN recordings from freely moving mice clearly demonstrate that individual neurons integrate reward, kinematic, and spatiotemporal features, challenging traditional models. However, the specific relationship between spatiotemporal activity and factors like action value remains unclear.

      We really appreciate this reviewer for their valuable comments.

      Reviewer #2 (Recommendations For The Authors):

      One small suggestion is to make sure that all the panels in the figures are well annotated. I struggled in places to know what certain alignments or groupings meant because they were not labelled. An example would be what do the lines correspond to in the lower panels of Figure 2D and E. I could figure it out from other panels but it would have helped if each panel had better labelling.

      Thanks for pointing this out, we have improved labelling across the figures and corrected the specific example you have pointed out.

      The paper is very nice though. Congratulations!

      Thank you very much.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editor for the comment. A statistics table has been added.

      References:

      Lazaridis, I., Tzortzi, O., Weglage, M., Märtin, A., Xuan, Y., Parent, M., Johansson, Y., Fuzik, J., Fürth, D., Fenno, L. E., Ramakrishnan, C., Silberberg, G., Deisseroth, K., Carlén, M., & Meletis, K. (2019). A hypothalamus-habenula circuit controls aversion. Molecular Psychiatry, 24(9), 1351–1368. https://doi.org/10.1038/s41380-019-0369-5

      Martinez-Garcia, R. I., Voelcker, B., Zaltsman, J. B., Patrick, S. L., Stevens, T. R., Connors, B. W., & Cruikshank, S. J. (2020). Two dynamically distinct circuits drive inhibition in the sensory thalamus. Nature, 583(7818), 813–818. https://doi.org/10.1038/s41586-0202512-5

      McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

      Zingg, B., Chou, X. lin, Zhang, Z. gang, Mesik, L., Liang, F., Tao, H. W., & Zhang, L. I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron, 93(1), 33–47. https://doi.org/10.1016/j.neuron.2016.11.045

    1. eLife Assessment

      Kreeger et al. convincingly demonstrate that octopus cells in the mouse cochlear nucleus, previously thought to rely primarily on excitatory inputs for coincidence detection, also receive glycinergic inhibitory synaptic inputs that influence their synaptic integration. Using advanced techniques, including genetic mouse models, optogenetics, microscopy, slice physiology, and computational modeling, this important study reveals that inhibition can shunt synaptic currents and alter the timing of dendritic EPSPs, both of which are significant for auditory processing. This research broadens the understanding of octopus cells' roles in sensory processing, highlighting the importance of inhibitory inputs in shaping fast, high-frequency neural response capabilities.

    2. Reviewer #1 (Public review):

      Kreeger and colleagues have explored the balance of excitation and inhibition in the cochlear nucleus octopus cells of mice using morphological, electrophysiological and computational methods. On the surface, the conclusion, that synaptic inhibition is present, does not seem like a leap. However, the octopus cells have been in the past portrayed as lacking synaptic inhibition. This view was supported by the paucity of glycinergic fibers in the octopus cell area and the lack of apparent IPSPs. Here, Kreeger et al., used beautiful immunohistochemical and mouse genetic methods to quantify the inhibitory and excitatory boutons over the complete surface of individual octopus cells and further analyzed the proportions of the different subtypes of spiral ganglion cell inputs. I think the analysis of synaptic distribution and the origin of the excitatory inputs stands as one of the most complete descriptions of any neuron, leaving little doubt about the presence of glycinergic boutons.

      Kreeger et al then examined inhibition physiologically. Recordings from these neurons are notoriously difficult to make because of the enormous leak currents that shunt membrane stimuli and currents, and complicate voltage clamp. The authors have tried to overcome these limitations using drugs to block leak conductances, and computational approaches based on realistic parameters. They conclude that dendritic inhibition can modify the size and kinetics of excitatory signals, and may play out in computations made on temporally dispersed stimuli as might be experienced during a ramp in sound frequency or complex natural sounds like vocalizations.

    3. Reviewer #2 (Public review):

      Kreeger et.al provided mechanistic evidence for flexible coincidence detection of auditory nerve synaptic inputs by octopus cells in the mouse cochlear nucleus. The octopus cells are highly specialized neurons that, with appropriate stimuli, can fire repetitively at very high rates (> 800 Hz in vivo), yield responses dominated by the onset of sound for simple stimuli, and integrate auditory nerve inputs over a wide frequency span. Previously, it was thought that octopus cells received little inhibitory input, and their integration of auditory input depended principally on temporally precise coincidence detection of excitatory auditory nerve inputs, coupled with a low input resistance established by high levels of expression of certain potassium channels and hyperpolarization-activated channels.

      This study provides convincing evidence that octopus cells do in fact receive glycinergic synaptic input that can influence the efficacy of excitatory dendritic synaptic activity. By coupling selected genetic mouse models to characterize synaptic inputs and enable optogenetic stimulation of subsets of afferents, fluorescent microscopy, detailed reconstructions of the location of inhibitory synapses on the soma and dendrites of octopus cells, slice physiology, and computational modeling, they have been able to clarify the presence of functional inhibition and elucidate some of the features of the inhibitory inputs to octopus cells at a biophysical level. They also show through modeling that inhibition is predicted to both provide shunting of synaptic currents and to change the peak timing of dendritic EPSPs as they travel to the soma. Both of these effects are potentially critically important in integration in these fast, coincidence-detecting neurons, and the magnitudes of the effects could have physiological significance. Overall, this work extends thinking about the functional sensory processing roles of octopus cells beyond the pre-existing hypotheses that are focussed primarily on the coincidence detection of excitatory inputs.

      The authors have addressed all of my prior concerns, including improving several aspects of the presentation. The modeling is better described, which is critical because it provides a foundation to help interpret some of the physiology and to propose specific functions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work analyzes how specialized cells in the auditory cells, known as the octopus cells, can detect coincidences in their inputs at the submillisecond time scale. While previous work indicated that these cells receive no inhibitory inputs, the present study unambiguously demonstrates that these cells receive inhibitory glycinergic inputs. The physiologic impact of these inputs needs to be studied further. It remains incomplete at present but could be made solid by addressing caveats related to similar sizes of excitatory postsynaptic potentials and spikes in the octopus neurons.

      We apologize for not explicitly describing our experimental methods and analyses procedures that ensure the discrimination between action potentials and EPSPs. This has been addressed in responses to reviewer comments and amended in the manuscript.

      Reviewer #1 (Public Review):

      Kreeger and colleagues have explored the balance of excitation and inhibition in the cochlear nucleus octopus cells of mice using morphological, electrophysiological, and computational methods. On the surface, the conclusion, that synaptic inhibition is present, does not seem like a leap. However, the octopus cells have been in the past portrayed as devoid of inhibition. This view was supported by the seeming lack of glycinergic fibers in the octopus cell area and the lack of apparent IPSPs. Here, Kreeger et al. used beautiful immunohistochemical and mouse genetic methods to quantify the inhibitory and excitatory boutons over the complete surface of individual octopus cells and further analyzed the proportions of the different subtypes of spiral ganglion cell inputs. I think the analysis stands as one of the most complete descriptions of any neuron, leaving little doubt about the presence of glycinergic boutons.

      Kreeger et al then examined inhibition physiologically, but here I felt that the study was incomplete. Specifically, no attempt was made to assess the actual, biological values of synaptic conductance for AMPAR and GlyR. Thus, we don't really know how potent the GlyR could be in mediating inhibition. Here are some numbered comments:

      (1) "EPSPs" were evoked either optogenetically or with electrical stimulation. The resulting depolarizations are interpreted to be EPSPs. However previous studies from Oertel show that octopus cells have tiny spikes, and distinguishing them from EPSPs is tricky. No mention is made here about how or whether that was done. Thus, the analysis of EPSP amplitude is ambiguous.

      We agree that large EPSPs can be difficult to distinguish from an octopus cell’s short spikes during experiments. During analysis, we distinguished spikes from EPSPs by generating phase plots, which allow us to visualize the first derivative of the voltage trace on the y-axis and the value of the voltage on the x-axis at each moment in time. In the example shown below, four depolarizing events were electrically evoked in an octopus cell (panel A). The largest of these events (shown in orange in panels B-D) has an amplitude of ~9mV and could be a small spike. The first derivative of the voltage (panel C) reveals a bi-phasic response in the larger orange trace, where during the rising phase (mV/ms > 0) of the EPSP there is a second, sharper rising phase for the spike. Like more traditionally sized action potentials, phase plots for octopus cell spikes also reveal a sharp change in the rate of voltage change over time (Author response image 1 panel D, ✱) after the rising action of the EPSP begins to slow. EPSPs (shown in blue in panels B-D) lack the deflection in the phase plot. Not all cases were as unambiguous as this example. Therefore, our analysis only included subthreshold stimulation that unambiguously evoked EPSPs, not spikes. A brief description of this analysis has been added to the methods text (lines 625-627) and we have noted in the results section that both ChR2-evoked and electrically-evoked stimulation can produce small action potentials, which were excluded from analysis (lines 156-158).

      Author response image 1.

      (2) For this and later analysis, a voltage clamp of synaptic inputs would have been a simple alternative to avoid contaminating spikes or shunts by background or voltage-gated conductances. Yet only the current clamp was employed. I can understand that the authors might feel that the voltage clamp is 'flawed' because of the failure to clamp dendrites. But that may have been a good price to pay in this case. The authors should have at least justified their choice of method and detailed its caveats.

      We agree that data collected using voltage-clamp would have eliminated the confound of short action potentials and avoided the influence of voltage-gated conductances. The large-diameter, and comparatively simple dendritic trees of octopus cells make them good morphological candidates for reliable voltage clamp. However, as suggested, we were concerned that the abundance of channels open at the neuron’s resting potential would make it difficult to sufficiently clamp dendrites. Ultimately, given the low input resistances of octopus cells and the fast kinetics of excitatory inputs, we determined that bad voltage clamp conditions were likely to result in unclamped synaptic events with unpredicted distortions in kinetics and attenuation (To et al. 2022; PMID: 34480986; DOI: 10.1016/j.neuroscience.2021.08.024). We therefore chose to focus our efforts on current-clamp.

      Beyond the limits of both current-clamp and voltage-clamp, we chose to leave all conductances that influence EPSP dendritic propagation intact because our model demonstrates that active Kv and leak conductances shape and attenuate synaptic inputs as they travel through the dendritic tree (Supp. Fig. 4F-G). The addition of voltage-clamp recordings would not impact the conclusions we make about EPSP summation at the soma. Future studies will need to focus on a dendrite-centric view of local excitatory and inhibitory summation. For dendrite-centric experiments, dendritic voltage-clamp recordings are well suited to answer that set of questions.

      (3) The modeling raised several concerns. First, there is little presentation of assumptions, and of course, a model is entirely about its assumptions. For example, what excitatory conductance amplitudes were used? The same for inhibitory conductance? How were these values arrived at? The authors note that EPSGs and IPSGs had peaks at 0.3 and 3 ms. On what basis were these numbers obtained? The model's conclusions entirely depend on these values, and no measurements were made here that could have provided them. Parenthetical reference is made to Figure S5 where a range of values are tested, but with little explanation or justification.

      We apologize for not providing this information. We used our octopus neuron model to fit both EPSP and IPSP parameters to match experimental data. We have expanded the methods to include final values for the conductances (lines 649-651), which were adjusted to match experimental values seen in current-clamp recordings. We have also expanded the results section to describe each of the parameters we tuned (lines 203-222). An example of these adjustments is illustrated in Fig. 4F where the magnitude of inhibitory potentials at different conductances (100nS and 1nS) was compared to experimental data over a range of octopus cell input resistance conditions. Kinetic parameters were determined by aligning modeled PSPs to the rise times and full width at half maximum (FWHM) measurements from experiments under control and Kv block conditions. The experimental data for EPSPs and IPSPs that was used to fit the model is shown in Author response image 2 below.

      Author response image 2.

      (4) In experiments that combined E and I stimulation, what exactly were time courses of the conductance changes, and how 'synchronous' were they, given the different methods to evoke them? (had the authors done voltage clamp they would know the answers).

      We chose to focus data collection on voltage changes at the soma under physiological conditions to better understand how excitation and inhibition integrate at the somatic compartment. Our conclusions in the combined E and I stimulation experiments require the resting membrane properties of octopus cells to be intact to make physiologically-relevant conclusions. Our current-clamp data includes the critical impact of leak, Kv, and HCN conductances on this computation. Reliable voltage-clamp would necessitate the removal of the Kv and HCN conductances that shape PSP magnitude, shape, and speed. Because it was not necessary to measure the conductances and kinetics of specific channels, we chose to use current-clamp.

      Evoked IPSPs and EPSPs had cell-to-cell variability in their latencies to onset. Somatically-recorded optically-evoked inhibition under pharmacological conditions that changed cable properties had onset latencies between 2.5 and 4.3ms; electrically-evoked excitation under control conditions had latencies between 0.8 and 1.4ms. To overcome cell-to-cell timing variabilities, we presented a shuffled set of stimulation pairings that had a 3ms range of timings with 200µs intervals. As the evoked excitation and inhibition become more ‘synchronous’, the impact on EPSP magnitude and timing is greatest. Data presented in this paper was for the stimulation pairings that evokes a maximal shift in EPSP timing. On average, this occurred when the optical stimulation began ~1.2ms before electrical stimulation. Stimulation pairing times ranged between a 0ms offset and a 1.8ms offset at the extremes. An example of the shuffled stimulation pairings is shown in Author response image 3 below, and we have included information about the shuffled stimulus in the methods (lines 627-630)

      Author response image 3.

      (5) Figure 4G is confusing to me. Its point, according to the text, is to show that changes in membrane properties induced by a block of Kv and HCN channels would not be expected to alter the amplitudes of EPSCs and IPSCs across the dendritic expanse. Now we are talking about currents (not shunting effects), and the presumption is that the blockers would alter the resting potential and thus the driving force for the currents. But what was the measured membrane potential change in the blockers? Surely that was documented. To me, the bigger concern (stated in the text) is whether the blockers altered exocytosis, and thus the increase in IPSP amplitude in blockers is due BOTH to loss of shunting and increase in presynaptic spike width. Added to this is that 4AP will reduce the spike threshold, thus allowing more ChR2-expressing axons to reach the threshold. Figure 4G does not address this point.

      These are valuable points that motivated us to improve the clarity of this figure and the corresponding text. We discussed two separate points in this paragraph and were not clear. Our intention with Figure 4G was to address concerns that using pharmacological blockers changes driving forces and may confound the measured change in magnitude of postsynaptic potentials. Membrane potentials hyperpolarized by approximately 8-10 mV after application of blockers. We corrected for this effect by adding a holding current to depolarize the neuron to its baseline resting potential. Text in the results (lines 187-190) and figure legends have been changed to clarify these points.

      We also removed any discussion of presynaptic effects from this portion of the text because our description was incomplete and we did not directly collect data related to these claims. We originally wrote, “While blocking Kv and HCN allowed us to reveal IPSPs at the soma, 4-AP increases the duration of the already unphysiological ChR2-evoked presynaptic action potential (Jackman et al., 2014; DOI: 10.1523/jneurosci.4694-13.2014), resulting in altered release probabilities and synaptic properties, amongst other caveats (Mathie et al., 1998; DOI: 10.1016/S0306-3623(97)00034-7)”. Ultimately, effects on exocytosis, presynaptic excitability, or release probability are only relevant for the experiments presented in Figure 4. Figure 4 serves as evidence that synaptic release of glycine elicits strychnine-sensitive inhibitory postsynaptic potentials in octopus cells. Concerns of presynaptic effects do not carry over to the data presented in Figure 5, as Kv and HCN were not blocked in these experiments. Therefore, we have removed this portion of the text.

      (6) Figure 5F is striking as the key piece of biological data that shows that inhibition does reduce the amplitude of "EPSPs" in octopus cells. Given the other uncertainties mentioned, I wondered if it makes sense as an example of shunting inhibition. Specifically, what are the relative synaptic conductances, and would you predict a 25% reduction given the actual (not modeled) values?

      We agree that both shunting and hyperpolarizing inhibition could play a role in the measured EPSP changes. Because we focused data collection on voltage changes at the soma under physiological conditions, we cannot calculate the relative synaptic conductances. Together, our experimental current-clamp results paired with estimates from the model provide compelling evidence for the change we observe in EPSPs. Regardless, the relative weights of the synaptic conductances is a very interesting question, but this information is not necessary to answer the questions posed in this study, namely the impact of dendritic inhibition on the arrival of EPSPs in the soma.

      (7) Some of the supplemental figures, like 4 and 5, are hardly mentioned. Few will glean anything from them unless the authors direct attention to them and explain them better. In general, the readers would benefit from more complete explanations of what was done.

      We apologize for not fully discussing these figures in the results text. We have fully expanded the results section to detail the experiments and results presented in the supplement (lines 203-238).

      Reviewer #2 (Public Review):

      Summary:

      Kreeger et.al provided mechanistic evidence for flexible coincidence detection of auditory nerve synaptic inputs by octopus cells in the mouse cochlear nucleus. The octopus cells are specialized neurons that can fire repetitively at very high rates (> 800 Hz in vivo), yield responses dominated by the onset of sound for simple stimuli, and integrate auditory nerve inputs over a wide frequency span. Previously, it was thought that octopus cells received little inhibitory input, and their integration of auditory input depended principally on temporally precise coincidence detection of excitatory auditory nerve inputs, coupled with a low input resistance established by high levels of expression of certain potassium channels and hyperpolarization-activated channels.

      In this study, the authors used a combination of numerous genetic mouse models to characterize synaptic inputs and enable optogenetic stimulation of subsets of afferents, fluorescent microscopy, detailed reconstructions of the location of inhibitory synapses on the soma and dendrites of octopus cells, and computational modeling, to explore the importance of inhibitory inputs to the cells. They determined through assessment of excitatory and inhibitory synaptic densities that spiral ganglion neuron synapses are densest on the soma and proximal dendrite, while glycinergic inhibitory synaptic density is greater on the dendrites compared to the soma of octopus cells. Using different genetic lines, the authors further elucidated that the majority of excitatory synapses on the octopus cells are from type 1a spiral ganglion neurons, which have low response thresholds and high rates of spontaneous activity. In the second half of the paper, the authors employed electrophysiology to uncover the physiological response of octopus cells to excitatory and inhibitory inputs. Using a combination of pharmacological blockers in vitro cellular and computational modeling, the authors conclude that glycine in fact evokes IPSPs in octopus cells; these IPSPs are largely shunted by the high membrane conductance of the cells under normal conditions and thus were not clearly evident in prior studies. Pharmacological experiments point towards a specific glycine receptor subunit composition. Lastly, Kreeger et. al demonstrated with in vitro recordings and computational modeling that octopus cell inhibition modulates the amplitude and timing of dendritic spiral ganglion inputs to octopus cells, allowing for flexible coincidence detection.

      Strengths:

      The work combines a number of approaches and complementary observations to characterize the spatial patterns of excitatory and inhibitory synaptic input, and the type of auditory nerve input to the octopus cells. The combination of multiple mouse lines enables a better understanding of and helps to define, the pattern of synaptic convergence onto these cells. The electrophysiology provides excellent functional evidence for the presence of the inhibitory inputs, and the modeling helps to interpret the likely functional role of inhibition. The work is technically well done and adds an interesting dimension related to the processing of sound by these neurons. The paper is overall well written, the experimental tests are well-motivated and easy to follow. The discussion is reasonable and touches on both the potential implications of the work as well as some caveats.

      Weaknesses:

      While the conclusions presented by the authors are solid, a prominent question remains regarding the source of the glycinergic input onto octopus cells. In the discussion, the authors claim that there is no evidence for D-stellate, L-stellate, and tuberculoventral cell (all local inhibitory neurons of the ventral and dorsal cochlear nucleus) connections to octopus cells, and cite the relevant literature. An experimental approach will be necessary to properly rule out (or rule in) these cell types and others that may arise from other auditory brainstem nuclei. Understanding which cells provide the inhibitory input will be an essential step in clarifying its roles in the processing of sound by octopus cells.

      We are glad that the reviewer agrees with the conclusions we have made and is interested in learning more about how these findings impact sound processing. We agree that defining the source of inhibition will dramatically shape our understanding of the computation octopus cells are making. However, this is not an easy task, given the small size of the octopus cell area, and will involve considerable additional work. Since the overall findings do not depend on knowing the source of inhibition, we have instead re-written the discussion to clarify the lack of evidence for intrinsic inhibitory inputs to octopus cells, in addition to presenting likely candidates. As genetic profiles of cochlear nucleus and other auditory brainstem neurons become available, we intend to make and utilize genetic mouse models to answer questions like this.

      The authors showed that type 1a SGNs are the most abundant inputs to octopus cells via microscopy. However, in Figure 3 they compare optical stimulation of all classes of ANFs, then compare this against stimulation of type 1b/c ANFs. While a difference in the paired-pulse ratio (and therefore, likely release probability) can be inferred by the difference between Foxg1-ChR2 and Ntng1-ChR2, it would have been preferable to have specific data with selective stimulation of type 1a neurons.

      We agree that complete genetic access to only the Ia population would have been the preferable approach, but we did not have an appropriate line when beginning these experiments. Because our results did not suggest a meaningful difference between the populations, we did not pursue further investigation once a line was available.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Besides the points mentioned in the main review:

      Minor

      (1) I really like the graphics and the immunohistological presentation.

      (2) Lines 316-319 say that octopus cells lack things like back-propagating spikes and dendritic Ca spikes. How do you know this?

      This statement was intended to be a summary of suggestions from the literature and lacked references and context as written. We have rewritten this section and clarified that our hypothesis was formed from data found in the literature (lines 334-337).

      (3) Spectrograms of Figure 6A...where were these data obtained?

      We recorded and visualized human-generated rhythmic tapping and high-frequency squeaking sounds using Audacity. The visualizations of rhythmic tapping and imitated vocalizations are meant to show two different types of multi-frequency stimuli we hypothesize would result in somatic summation within an octopus cell’s spike integration window, despite differences in timing. We rewrote the figure legend to explain more clearly what is shown and how it relates to the model in Figure 6.

      (4) 'on-path' and 'off-path' seem like jargon that may not be clear to the average reader.

      Thank you for pointing out our use of unapproachable jargon. We have replaced the term from the figure with “proximal” and “distal” inhibition. In the main text, we now describe on-path and off-path together as the effect of location of dendritic inhibition on somatically recorded EPSPs.

      (5) The paper could benefit from a table of modeled values.

      We have added specific details about the modelling in the text and clarified which modeled values were referenced from previous computational models and which were tuned to fit experimental data. Since most values were taken from a referenced publication, we did not add a table and instead point readers towards that source.

      (6) Figure S4A-C what currents were delivered to the modeled cells?

      The model cells were injected with a -0.8 nA DC current for 300 ms in current clamp mode. This information has been added to the figure legend.

      (7) In that figure "scaling factors" scale exactly which channels?

      Scaling factor is used to scale low-voltage activated K<sup>+</sup> (ḡ<sub>KLT</sub>), high threshold K<sup>+</sup> (ḡ<sub>KHT</sub>), fast transient K<sup>+</sup> (ḡ<sub>KA</sub>), hyperpolarization-activated cyclic nucleotide-gated HCN (ḡ<sub>h</sub>) but not fast Na<sup>+</sup> (ḡ<sub>Na</sub>) and leak K<sup>+</sup> (ḡ<sub>leak</sub>). This information has been added to the text (lines 205-208 and 646-653).

      (8) In performing and modeling Kv/HCN block, do you know how complete the level of the block is?

      Since we cannot assess how complete the level of block is, we have changed the language in the text to clarify that we are reducing Kv and HCN channel conductance to the degree needed to increase resistance of the neuron (line 185).

      (9) More on this Figure S4. It is hardly referred to in the text except to say that it supports that blocking the Kv/HCN channels will enhance the IPSP. Given how large the figure is, can you offer more of a conclusion than that? Also, in the synaptic model in that figure, the IPSCs are presumably happening in current-clamp conditions, and the reduction in amplitude of the IPSC (as opposed to the increase in IPSP) is due to hyperpolarization. Can you simply state that so readers can track what this figure is showing? Other similar things: what is a transfer impedance? How is it measured? What do we take from the analysis?

      We have elaborated on our description of both Supp. Fig. 4 and Supp. Fig. 5 in the results section of the text (lines 203-238).

      (10) Figure S5 also needs a better explanation. E.g., in C-D, what does 'average' mean? The gray is an SD of this average? You modeled a range of values...but which ones are physiological? To me, this is a key point.

      We have elaborated on our description of both Supp. Fig. 4 and Supp. Fig. 5 in the results section of the text (lines 203-238).

      Reviewer #2 (Recommendations For The Authors):

      General:

      The images and 3-D reconstructions are visually stunning, but they are not colorblind-friendly and in some cases, hard to distinguish. This shows up particularly in the green and blue colors used in Figure 1. Also, better representative images could be used for Figure 1B.

      Thank you for pointing out that blue and green were difficult to distinguish in Figure 1H. We have outlined the green inhibitory puncta in this image to make them more distinguishable. We have also increased the resolution of the image in Figure 1B for better clarity. All other colors are selected from Wong, 2011 (PMID: 21850730; DOI: https://doi.org/10.1038/nmeth.1618).

      Supplemental Figure 1D: The low-power view is good to have, but the CN is too small and the image appears a bit noisy. An inset showing the CN on a larger scale (higher resolution image?) would be more convincing. In this image, I see what appear to be cells in the DCN labeled, which calls into question the purity of the source of optogenetic synaptic activation. It is also difficult to tell whether there are other cells labeled in the VCN. Such inputs would still be minor, but it would be good to be very clear about the expression pattern.

      To offer more information about the activity of the Ntng1<sup>Cre</sup> line in other regions of the auditory system, we increased the resolution of the image included in Supp. Fig. 1D and have also included an additional image (Supp. Fig. 1E) of a coronal section of the cochlear nucleus complex with Ntng1-tdT labelling. This image provides additional context for the cells labeled in the DCN. The text in the figure legend has been changed to clarify that some cells in the DCN were labeled (lines 118-120).

      We agree that in the Ntng1<sup>Cre</sup> experiments, there is the possibility of minor contamination from excitatory cells that express ChR2 outside of the spiral ganglion. This is also true for our Foxg1<sup>Cre</sup> and Foxg1<sup>Flp</sup> experiments, because these lines label cortical cells in addition to cochlear cells. However, we do not observe direct descending inputs from the cortex into the PVCN, making contamination from other Foxg1<sup>Cre</sup>-positive neurons unlikely. While non-cochlear inputs from the Ntng1<sup>Cre</sup> line are possible, evidence from both lines gives us confidence that we are not capturing inputs to octopus cells outside the cochlea. Central axons from Type I spiral ganglion neurons have VGLUT1+ synaptic terminals. When comparing the overlap between VGLUT1+ terminals and Foxg1-tdT labelling, we see full coverage. That is, all VGLUT+ terminals on octopus cells are co-labelled by Foxg1<sup>Cre</sup>-mediated expression of tdTomato. An example image is shown below. Here, an octopus cell soma is labeled with blue fluorescent Nissl stain and inputs to the cochlear nucleus complex are labeled with Foxg1<sup>Cre</sup>-dependent tdTomato (Foxg1-tdT; magenta). We have also immunolabeled for VGLUT1 puncta in green. This eliminates the possibility that VGLUT+ cells from outside the cochlea and cortex are sources of excitation to octopus cells.

      Author response image 4.

      Further, we have looked at expression of Ntng1-tdT and Foxg1-EYFP together in the octopus cell area.  An example image is shown below. All Ntng1-tdT+ fibers (magenta) are also Foxg1-EYFP+ (green), suggesting that all Ntng1<sup>Cre</sup>-targeted inputs to octopus cells are a part of the Foxg1<sup>Cre</sup>-targeted input population, which are very likely to only be from the cochlea. We have expanded the results section to include information about the overlap in expression driven by the Ntng1<sup>Cre</sup> and Foxg1<sup>Flp</sup> lines.

      Author response image 5.

      Supplemental Figure 2 G: These are a bit hard to read. Perhaps use a different image, or provide a reference outline drawing telling us what is what.

      We have used a different image with a Thy1-YFP labeled octopus cell for clarity.

      In some places, the term "SGN" is used when referencing the axons and terminals within the CN, and without some context, this was occasionally confusing (SGN would seem to refer to the cell bodies). In some places in the text, it may be preferable to separate SGN, auditory nerve fibers (ANFs), and terminals, as entities for clarity.

      In order to make the study accessible to a broad neuroscience audience, we refer to the neurons of the spiral ganglion and their central axon projections using one name. We understand why, for those well acquainted with the auditory periphery, condensing terminology may feel awkward. However, for those readers unfamiliar with the anatomy of the cochlea and auditory nerve, we feel that the use of “SGN central axon” makes it clear that the “auditory nerve fibers” come from neurons in the spiral ganglion. This is clarified in the first paragraph of the introduction (lines 29-31) and in the methods (line 533).

      Specific: Numbers refer to the line numbers on the manuscript.

      L29-31: Cochlear nucleus neurons are more general in their responses than this sentence indicates. While we can all agree that they are specialized to carry (or improve upon) the representation of these specific features of sound, they also respond more generally to sounds that might not have specific information in any of these domains. They are not silos of neural computation, and their outputs become mixed and "re-represented" well before they reach the auditory cortex. Octopus cells are no exception to this. I suggest striking most of the first paragraph, and instead using the first sentence to lead into the second paragraph, and putting the last sentence (of the current first paragraph) at the end of the second (now first) paragraph.

      We agree with this assessment and have made major changes to the introduction in line with these suggestions.

      L33-46: A number of points in this paragraph need references (exp. line 41).

      We agree and have added references accordingly.

      L43: Not sure what is meant by "fire at the onset of the sound, breaking it up into its frequency components"?

      We changed this text as part of a major reworking of the introduction.

      L47-66: Again more citations are needed (at the end of sentence at line 55, probably moving some of the citations from the next sentence up).

      We agree and have added references accordingly.

      L51: The consistent orientation of octopus cell dendrites across the ANFs has been claimed in the literature (as mentioned here), but there are some (perhaps problematic - plane of sectioning?) counterexamples from the older Golgi-stained images, and even amongst intracellularly stained cells (for example see Reccio-Spinoza and Rhode, 2020). This is important with regards to the broader hypothesis regarding traveling-wave compensation (e.g., McGinley et al; but also many others); if the cells are not all in the appropriate orientation then such compensation may be problematic. Likewise, the data from Lu et al., 2022, points towards a range of sensitivity to frequency-swept stimuli, some of which work in opposition to the traveling wave compensation hypothesis. It would seem that with the Thy1 mice, you have an opportunity to clarify the orientation. Figures 1A and 2A show a consistent dendritic orientation, assuming that these drawings are reconstructions of the cells as they were actually oriented in the tissue. Can you either comment on this or provide clearer evidence?

      We are happy to offer more information about the appearances of octopus cells in our preparations. In our hands, sparsely labeled octopus cells in Thy1-YFP-H mice show consistent dendritic orientation when visualized in a 15 degree parasaggital plane, with the most diversity apparent in cells with somas located more dorsally in the octopus cell area. We hypothesize that this is due to the limited area through which the central projections of spiral ganglion neurons (i.e. ANFs) must pass through before they enter the dorsal cochlear nucleus and continue their tonotopic organization in that area.

      A caveat to studies without physiological or genetic identification of octopus cells is the assumption that all neurons in the octopus cell area are octopus cells. We find, especially along the borders of the octopus cell area, that stellate cells can be seen amongst octopus cells. Because stellate cell dendrites are not oriented like octopus cell dendrites, any stellate cells misidentified as octopus cells would appear to have poorly-oriented dendrites. This may explain why some studies report this finding. In addition, it can be difficult to assess tonotopic organization because of the 3D trajectory of tightly bundled axons, which is not capturable by a single section plane. Although a parasaggital plane of sectioning captures the tonotopic axis in one part of the octopus cell area, that same plane may be perpendicular at the opposing end.

      L67: canonical -> exceptional.

      Thank you for the suggestion. We have made this change in the introduction.

      L127: This paragraph was confusing on first reading. I don't think Supplemental Figure 1D shows the restricted pattern of expression very clearly. The "restricted to SGNs" might be better as "restricted to auditory nerve fibers" (except in the DCN, where there seem to be some scattered small cells?). A higher magnification image of the CN, but lower magnification than in panel E, would be helpful here.

      To avoid confusion, we have re-written this paragraph (lines 117-127) and included a higher magnification image of the CN in a revised Supp. Fig. 1.

      L168: Here, perhaps say ANFs instead of SGNs.

      As above, we have decided to describe ANFs as SGN central axons to make the anatomy more accessible to people unfamiliar with cochlear anatomy.

      L201-204: The IPSPs are surprisingly slow (Figures 5B, C), especially given the speed of the EPSPs/EPSCs in these cells. This is reminiscent of the asymmetry between EPSC and IPSC kinetics in bushy cells (Xie and Manis, 2014). The kinetics used in the model (3 ms; mentioned on line 624) however seem a bit arbitrary and no data is provided for the selection of that value. Were there any direct measurements of the IPSC kinetics (all of the traces in the paper are in the current clamp) that were used to justify this value?

      The kinetics of the somatically-recorded IPSPs are subject to the effects of our pharmacological manipulations. EPSPs measured at the soma under control conditions are small amplitude and rapid. With pharmacological reduction of HCN and Kv channels, EPSPs are larger and slower (please see figure in response to a similar question posed by Reviewer #1). We expect that this change also occurs with the IPSP kinetics under pharmacological conditions. Our justification of kinetics has been expanded and justified in the methods section (lines 641-661).

      L594: Technically, this is a -11 mV junction potential, but thanks for including the information.

      We have corrected this in the text (line 618). Thank you for the close reading of all experimental and methodological details.

      L595: The estimated power of the LED illumination at the focal plane should be measured and indicated here.

      We measured the power of the LED illumination at the focal plane using a PM100D Compact Power and Energy Meter Console (Thorlabs), a S120C Photodiode Power Sensor (Thorlabs), and a 1000µm diameter Circular Precision Pinhole (Thorlabs). Light intensity at the focal plane ranged between 1.9 and 4.1mW/mm<sup>2</sup>, corresponding to 6% and 10% intensity on the Colibri5 system. We have reported these measurements in the results section (Lines 621-622).

      L609: One concern about the model is that the integration time of 25 microseconds is rather close to the relative shifts in latency. While I doubt it will make a difference (except in the number), it may be worth verifying (spot checks, at least) that running the model with a 5 or 10-microsecond step yields a similar pattern of latency shifts (e.g., Supplementary Figure 5, Figure 5).

      Also, it is not clear what temperature the model was executed at (I would presume 35C); this needs to be given, and channel Q10's listed.

      We realize that additional information is needed to fully understand the model and have added this to the results and the methods. The synaptic mechanism (.mod) files were obtained from Manis and Campagnola (2018) (PMID: 29331233; DOI: https://doi.org/10.1016/j.heares.2017.12.017). Q10 (3) and temperature (22°C) were also matched to parameters from Manis and Campagnola (2018). Because temperature is a critical factor for channel kinetics, we verified that our primary results remain consistent under conditions using a temperature of 35°C and a time step of 5µs, depicted below. Panel A illustrates the increase in IPSP as a function of glycine conductance under Kv+HCN block conditions at 35°C. As at 22°C, an increase in IPSP magnitude is absent in the control condition at 35°C. Panels B and C provide a direct comparison between the initial (i.e. 22°C) and suggested (i.e. 35°C) simulation conditions. Again we found that temperature does not have a major impact on the amplitude of IPSPs. Thus, results at 35°C do not change the conclusions we make from the model.

      Author response image 6.

      The nominal conductance densities should at least be provided in a table (supplemental, in addition to including them in the deposited code). The method for "optimization" of the conductance densities to match the experimental recordings needs to be described; the parameter space can be quite large in a model such as this. The McGinley reference needs a number.

      We added a more thorough description of modeling parameters and justification of choices in the methods section of the text (lines 641-661). We have also added a reference number to the McGinley 2012 reference in the text.

      I think this is required by the journal:

      The model code, test results, and simulation results should be deposited in a public resource (Github would be preferable, but dryad, Zenodo, or Figshare could work), and the URL/doi for the resource provided in the manuscript. This includes the morphology swc/hoc file. The code should be in a form, and with a description, that readily allows an interested party with appropriate skills to download it and run it to generate the figures.

      We will upload the code and all associated simulation files to the ModelDB repository upon publication.

    1. eLife Assessment

      In this manuscript, Abd El Hay and colleagues use an innovative behavioral assay and analysis method, together with standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons, to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. Compelling evidence is provided for a role of TRPM2 channels in warmth avoidance behavior, but it remains unclear whether this involves channel activity in the periphery or in the brain. In contrast, TRPV1 is clearly implicated at the cellular level in warmth detection. These findings are important because there is substantial ongoing discussion regarding the contribution of TRP channels to different aspects of thermo-sensation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas the elimination of TRPV1 has the largest effect on the neuronal responses. These findings are very important, as there is substantial ongoing discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation.

      Strengths:

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for a role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior.

      Weaknesses:

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. As the authors acknowledge, it remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is directly related to TRPM2 functioning as a warmth sensor in sensory neurons. The effects of the TRPM2 KO on the proportion of warmth sensing neurons are very subtle, and TRPM2 may also play a role in the behavioral assay through its expression in thermoregulatory processes in the brain. Future behavioral experiments on sensory-neuron specific TRPM2 knockout animals will be required to clarify this important point.

    3. Reviewer #3 (Public review):

      Summary and strengths:

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior.

      Comments on revisions:

      Thanks to the authors for addressing all the points raised. They now include more details about the classifier, better place their work in context of the literature, corrected the FOVs, and explained the model a bit further. The new analysis in Figure 2 has thrown up some surprising results about cellular responses that seem to reduce the connection between the cellular and behavioral data and there are a few things to address because of this:

      TRPM2 deficient responses: The differences in the proportion of TRPM2 deficient responders compared to WT are only observed at one amplitude (39C), and even at this amplitude the effect is subtle. Most surprisingly, TRPM2 deficient cells have an enhanced response to warm compared to WT mice to 33C, but the same response amplitude as WT at 36C and 39C. The authors discuss why this disconnect might be the case, but together with the lack of differences between WT and TRPM2 deficient mice in Fig 3, the data seem in good agreement with ref 7 that there is little effect of TRPM2 on DRG responses to warm in contrast to a larger effect of TRPV1. This doesn't take away from the fact there is a behavioral phenotype in the TRPM2 deficient mice, but the impact of TRPM2 on DRG cellular warm responses is weak and the authors should tone down or remove statements about the strength of TRPM2's impact throughout the manuscript, for example:<br /> "Trpv1 and Trpm2 knockouts have decreased proportions of WSNs."<br /> "this is the first cellular evidence for the involvement of TRPM2 on the response of DRG sensory neurons to warm-temperature stimuli"<br /> "we demonstrate that TRPV1 and TRPM2 channels contribute differently to temperature detection, supported by behavioural and cellular data"<br /> "TRPV1 and TRPM2 affect the abundance of WSNs, with TRPV1 mediating the rapid, dynamic response to warmth and TRPM2 affecting the population response of WSNs."<br /> "Lack of TRPV1 or TRPM2 led to a significant reduction in the proportion of WSNs, compared to wildtype cultures".

      The new analysis also shows that the removal of TRPV1 leads to cellular responses with smaller responses at low stimulus levels but larger responses with longer latencies at higher stimulus levels. Authors should discuss this further and how it fits with the behavioral data.

      Analysis clarification: authors state that TRPM2 deficient WSNs show "Their response to the second and third stimulus, however, are similar to wildtype WSNs, suggesting that tuning of the response magnitude to different warmth stimuli is degraded in Trpm2-/- animals." but is there a graded response in WT mice? It looks like there is in terms of the %responders but not in terms of response amplitude or AUC. Authors could show stats on the figure showing differences in response amplitude/AUC/responders% to different stimulus amplitudes within the WT group.

      New discussion point: sex differences are "similar to what has been shown for an operant-based thermal choice assay (11,56)", but in their rebuttal, they mention that ref 11 did not report sex differences. 56 does. Check this.

      The authors added in new text about the drift diffusion model in the results, however it's still not completely clear whether the "noise" is due to a perceptual deficit or some other underlying cause. Perhaps authors could discuss this further in the discussion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review): 

      Summary: 

      The authors use an innovative behavior assay (chamber preference test) and standard calcium imaging experiments on cultured dorsal root ganglion (DRG) neurons to evaluate the consequences of global knockout of TRPV1 and TRPM2, and overexpression of TRPV1, on warmth detection. They find a profound effect of TRPM2 elimination in the behavioral assay, whereas elimination of TRPV1 has the largest effect on neuronal responses. These findings are of importance, as there is still substantial discussion in the field regarding the contribution of TRP channels to different aspects of thermosensation. 

      Strengths: 

      The chamber preference test is an important innovation compared to the standard two-plate test, as it depends on thermal information sampled from the entire skin, as opposed to only the plantar side of the paws. With this assay, and the detailed analysis, the authors provide strong supporting evidence for the role of TRPM2 in warmth avoidance. The conceptual framework using the Drift Diffusion Model provides a first glimpse of how this decision of a mouse to change between temperatures can be interpreted and may form the basis for further analysis of thermosensory behavior. 

      Weaknesses: 

      The authors juxtapose these behavioral data with calcium imaging data using isolated DRG neurons. Here, there are a few aspects that are less convincing. 

      (1) The authors study warmth responses using DRG neurons after three days of culturing. They propose that these "more accurately reflect the functional properties and abundance of warm-responsive sensory neurons that are found in behaving animals." However, the only argument to support this notion is that the fraction of neurons responding to warmth is lower after three days of culture. This could have many reasons, including loss of specific subpopulations of neurons, or any other (artificial?) alterations to the neurons' transcriptome due to the culturing. The isolated DRGs are not selected in any way, so also include neurons innervating viscera not involved in thermosensation. If the authors wish to address actual changes in sensory nerves involved in warmth sensing in TRPM2 or TRPV1 KO mice without disturbing the response profile as a result of the isolation procedure, other approaches would be needed (e.g. skin-nerve recordings or in vivo DRG imaging).  

      We agree that there could be several reasons as to why the responses of cultured DRGs are reduced compared to the acute/short-term cultures. It is possible ––and likely–– that transcriptional changes happen over the course of the culturing period. It is also possible that it is a mere coincidence that the 3-day cultures have a response profile more similar to the in vivo situation than the acute cultures. In the revised manuscript, we have therefore toned down the claim that the 3-day cultures mirror the native conditions more appropriately and included the sentence “However, whether 3-day cultures resemble native sensory neurons more closely than acute cultures in terms of their (transcriptional) identity is currently unknown.” (page 5). 

      We now also included a section “Limitations of the study” and bring this point up there as well and acknolwedge that longer culturing periods may cause changes in the neurons and may result in a drift away from their native state. 

      Nevertheless, our results clearly show that acute cultures have a response profile that is much more similar to damaged/”inflamed” neurons, irrespective of any comparison to the 3 daycultures. Therefore, we believe, it is helpful to include this data to make scientists aware that acute cultures are very different to non-inflamed native/in vivo DRG neurons that many researchers use in their experiments.

      (2) The authors state that there is a reduction in warmth-sensitive DRG neurons in the TRPM2 knockout mice based on the data presented in Figure 2D. This is not convincing for the following reasons. First, the authors used t-tests (with FDR correction - yielding borderline significance) whereas three groups are compared here in three repetitive stimuli. This would require different statistics (e.g. ANOVA), and I am not convinced (based on a rapid assessment of the data) that such an analysis would yield any significant difference between WT and TRPM2 KO. Second, there seems to be a discrepancy between the plot and legend regarding the number of LOV analysed (21, 17, and 18 FOV according to the legend, compared to 18, 10, and 12 dots in the plot). Therefore, I would urge the authors to critically assess this part of the study and to reconsider whether the statement (and discussion) that "Trpm2 deletion reduces the proportion of warmth responders" should be maintained or abandoned. . 

      Yes, we agree that the statistical tests indicated by the referee are more appropriate/robust for the data shown in Figures 1F, 2D, and 4G.

      When we perform 2-way repeated measures ANOVA and subsequent multiple comparison test (with Dunnets correction) against Wildtype, for data shown in Fig. 2D, both the main effect (Genotype) and the interaction term (Stimulus x Genotype) are significant. The multiple comparison yields very similar result as in the current manuscript, with the difference that the TRPM2-KO data for the second stimulus (~36°C) is borderline significant (with a p-value of p=0.050).

      Due to the possible dependence of the repeated temperature stimuli and the variability of each stimulus between FOVs (Fig. 2C), it is possible that a mixed-effect model that accounts for these effects is more appropriate. 

      Similarly, for plots 1F and 4G, Genotype (either as main effect or as interaction with Time) is significant after a repeated measures two-way ANOVA. The multiple comparisons (with Bonferroni correction) only changed the results marginally at individual timepoints, without affecting the overall conclusions. The exception is Fig. 4G at 38°C, where the interaction of Time and Genotype is significant, but no individual timepoint-comparison is significant after Bonferroni correction.

      The main difference between the results presented above and the ones presented in the manuscript is the choice of the multiple comparison correction. We originally opted for the falsediscovery rate (FDR) approach as it is less prone to Type II errors (false negatives) than other methods such as Sidaks or Bonferroni, particularly when correcting for a large number of tests.

      However, we are mainly interested in whether the genotypes differ in their behavior in each temperature combination and the significant ANOVA tests for Fig. 1F and 4G support that point. The statistical test and comparison used in the original/previous version of the manuscript, comparing behavior at individual/distinct timepoints, are interesting, but less relevant (and potentially distracting), as we do not go into the details about the behavior at any given/distinct timepoint in the assay.

      Therefore, and per suggestion of the reviewer, we have updated the statistics in the revised version of the manuscript. Also, we now report the correct number of FOVs in the legend. The statistical details are now found in the legends of the respective figures.

      (3) It remains unclear whether the clear behavioral effect seen in the TRPM2 knockout animals is at all related to TRPM2 functioning as a warmth sensor in sensory neurons. As discussed above, the effects of the TRPM2 KO on the proportion of warmth-sensing neurons are at most very subtle, and the authors did not use any pharmacological tool (in contrast to the use of capsaicin to probe for TRPV1 in Figures S3 and S4) to support a direct involvement of TRPM2 in the neuronal warmth responses. Behavioral experiments on sensory-neuron-specific TRPM2 knockout animals will be required to clarify this important point

      As mentioned above, we have toned down the correlation between the cellular and behavioral data. 

      In the discussion we now clearly describe three possibilities as to why the Trpm2 knockout animals only show a subtle cellular thermal phenotype but a strong behavioral thermal preference phenotype: (i) permanent deletion of Trpm2 may result in developmental defects and/or compensatory mechanisms; (ii) The DRG population expressing Trpm2 may be more relevant for autonomic thermoregulation rather than behavioral responses to temperature; (iii) Trpm2 expression outside DRGs (possibly in the hypothalamic POA) may account for the altered thermal behavior. 

      (4) The authors only use male mice, which is a significant limitation, especially considering known differences in warmth sensing between male and female animals and humans. The authors state "For this study, only male animals were used, as we aimed to compare our results with previous studies which exclusively used male animals (7, 8, 17, 43)." This statement is not correct: all four mentioned papers include behavioral data from both male and female mice! I recommend the authors to either include data from female mice or to clearly state that their study (in comparison with these other studies) only uses male mice.  

      This is a valid point -- when our study started 7-8 years ago, we only used male mice (as did many other researchers) and this we would now do differently. We have now newly included a statement concerning this limitation in the “Limitations of this study” section of the manuscript. 

      Nevertheless, in the studies by Tan et al. And Vandevauw et al. only male animals were used for the behavioral experiments. Yarmolinsky et al.  And Paricio-Montesinons et al. used both males and females while, as far as we can tell, only Paricio-Montesions et al. Reported that no difference was observed between the sexes. 

      Wildtypes are all C57bl/6N from the provider Janvier. Generally, all lines are backcrossed to C57bl/6 mice and additionally inbreeding was altered every 4-6 generations by crossing to C57bl/6. Exactly how many times the Trp channel KOs have been backcrossed to C57bl/6 mice we cannot exactly state.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors of the study use a technically well-thought-out approach to dissect the question of how far TRPV1 and TRPM2 are involved in the perception of warm temperatures in mice. They supplement the experimental data with a drift-diffusion model. They find that TRPM2 is required to trigger the preference for 31{degree sign}C over warmer temperatures while TRPV1 increases the fidelity of afferent temperature information. A lack of either channel leads to a depletion of warm-sensing neurons and in the case of TRPV1 to a deficit in rapid responses to temperature changes. The study demonstrates that mouse phenotyping can only produce trustworthy results if the tools used to test them measure what we believe they are measuring. 

      Strengths: 

      The authors tackle a central question in physiology to which we have not yet found sufficient answers. They take a pragmatic approach by putting existing experimental methods to the test and refining them significantly. 

      Weaknesses: 

      It is difficult to find weaknesses. Not only the experimental methods but also the data analysis have been refined meticulously. There is no doubt that the authors achieved their aims and that the results support their conclusions. 

      There will certainly be some lasting impact on the future use of DRG cultures with respect to (I) the incubation periods, (II) how these data need to be analyzed, and (III) the numbers of neurons to be looked at. 

      As for the CPT assay, the future will have to show if mouse phenotyping results are more accurate with this technique. I'm more fond of full thermal gradient environments. However, behavioural phenotyping is still one of the most difficult fields in somatosensory research.  

      We thank the referee and were happy to read that the referee finds our study valuable and insightful. 

      Reviewer #3 (Public Review):  

      Summary and strengths: 

      In the manuscript, Abd El Hay et al investigate the role of thermally sensitive ion channels TRPM2 and TRPV1 in warm preference and their dynamic response features to thermal stimulation. They develop a novel thermal preference task, where both the floor and air temperature are controlled, and conclude that mice likely integrate floor with air temperature to form a thermal preference. They go on to use knockout mice and show that TRPM2-/- mice play a role in the avoidance of warmer temperatures. Using a new approach for culturing DRG neurons they show the involvement of both channels in warm responsiveness and dynamics. This is an interesting study with novel methods that generate important new information on the different roles of TRPV1 and TRPM2 on thermal behavior. 

      Open questions and weaknesses: 

      (1) Differences in the response features of cells expressing TRPM2 and TRPV1 are central and interesting findings but need further validation (Figures 3 and 4). To show differences in the dynamics and the amplitude of responses across different lines and stimulus amplitudes more clearly, the authors should show the grand average population calcium response from all responsive neurons with error bars for all 3 groups for the different amplitudes of stimuli (as has been presented for the thermal stimuli traces). The authors should also provide a population analysis of the amplitude of the responses in all groups to all stimulus amplitudes. Prior work suggests that thermal detection is supported by an enhancement or suppression of the ongoing activity of sensory fibers innervating the skin. The authors should present any data on cells with ongoing activity. 

      We have now included grand average population analysis of the different groups in the revised version, this is found in Figure 2E and F. Based on the referee’s suggestion and the new analysis, we now can report a (subtle) cellular phenotype observed in DRG cultures of Trpm2 deficient animals: when averaging all warmth responses, the new analysis suggests that Trpm2-deficient cultures lack modulation of the response magnitude across the three increasing consecutive warmth stimuli (33°C, 36°C and 39°C).

      Concerning the point about ongoing activity: We are not sure if it is possible in neuronal cultures to faithfully recapitulate ongoing activity. Ongoing activity has been mostly recorded in skinnerve preparations (or in older studies in other types of nerve recordings) and there are only very few studies that show ongoing activity in cultured neurons and in those instances the ongoing activity only starts in sensory neuron cultures when cultured for even longer time periods than 3 days (Ref.: doi: 10.1152/jn.00158.2018). We have very few cells that show some spontaneous activity, but these are too few to draw any conclusions. In any case, nerve fibers might be necessary to drive ongoing activity which are absent from our cultures.

      (2) The authors should better place their findings in context with the literature and highlight the novelty of their findings. The introduction builds a story of a 'disconnect' or 'contradictory' findings about the role of TRPV1 and TRPM2 in warm detection. While there are some disparate findings in the literature, Tan and McNaughton (2016) show a role for TRPM2 in the avoidance of warmth in a similar task, Paricio et al. (2020) show a significant reduction in warm perception in TRPM2 and TRPV1 knock out lines and Yarmolinksy et al. (2016) show a reduction in warm perception with TRPV1 inactivation. All these papers are therefore in agreement with the authors finding of a role for these channels in warm behavior. The authors should change their introduction and discussion to more correctly discuss the findings of these studies and to better pinpoint the novelty of their own work.  

      Paricio-Montesinos et al. argue that TRPM8 is crucial for the detection of warmth, as TRPM8KO animals are incapable of learning the operant task. TRPM2-KO animals and, to a smaller extent TRPV1-KO animals, have reduced sensitivity in the task, but are still capable of learning/performing the task. However, in our chamber preference assay this is reversed: TRPM2-KO animals lose the ability to differentiate warm temperatures while TRPM8 appears to play no major role. A commonality between the two studies is that while TRPV1 affects the detection of warm temperatures in the different assays, this ion channel appears not to be crucial. 

      Similarly, Yarmolinsky et al. show that Trpv1-inactivation only increases the error rate in their operant assay (from ~10% to ~30%), without testing TRPM2. And Tan et al. show the importance of TRPM2 in the preference task, without testing for TRPV1. 

      More generally, the choice of the assay, being either an operant task (Paricio-Montesinos et al. and Yarmolinsky et al.) or a preference assay without training of the mice (Tan et al. and our data here), might be important and different TRP receptors may be relevant for different types of temperature assays, which we have now included at the end of the discussion section in the revised manuscript. While our results generally agree with the previous studies, they add a different perspective on the analysis of the behavior (with correlation to cellular data). We now edited the manuscript to highlight the advances more clearly. 

      Nevertheless, we believe that a discrepancy between cellular and behavioral data in the former studies exists and we kept this in the introduction. We hope that our data and suggestions of more nuanced analysis of cellular and behavioral responses, in particular also differences in their kinetics, may be helping to guide future studies.  

      (3) The responses of 60 randomly selected cells are shown in Figure 2B. But, looking at the TRPM2-/- data, warm responses appear more obvious than in WTs and the weaker responders of the WT group appear weaker than the equivalent group in the TRPV1-/- and TRPM2-/- data. This does not necessarily invalidate the results, but it may suggest a problem in the data selection. Because the correct classification of warm-sensitive neurons is central to this part of the study more validation of the classifier should be presented. For example, the authors could state if they trained the classifier using equal amounts of cells, show some randomly selected cells that are warm-insensitive for all genotypes, and show the population average responses of warm-insensitive neurons.  

      The classifier was trained on a balanced dataset of 1000 (500 responders and 500 nonresponders), manually labelled traces across all 5 temperature stimuli. The prediction accuracy was 98%. We have now described more clearly how the classifier was trained (See Materials and Methods) and include examples of responders and non-responders, the population averages of each class as well as a confusion matrix of the classification in the revised manuscript (Suppl. Figure 4A and B).

      (4) The interpretation of the main behavioral results and justification of the last figure is presented as the result of changes in sensing but differences in this behavior could be due to many factors and this needs clarification and discussion. (i) The authors mention that 'crucially temperature perception is not static' and suggest that there are fluctuating changes in perception over time and conclude that their modelling approach helps show changes in temperature detection. They imply that temperature perceptual threshold changes over time, but the mouse could just as easily have had exactly the same threshold throughout the task but their motivation (or some other cognitive variable) might vary causing them to change chamber. The authors should correct this. (ii) Likewise, from their fascinating and high-profile prior work the authors suggest a model of internal temperature sensing whereby TRPM2 expression in the hypothalamus acts as an internal sensory of body temperature. Given this, and the slow time course of the behavior in chambers with different ambient temperatures, couldn't the reason for the behavioral differences be due to central changes in hypothalamic processing rather than detection by skin temperature? If TRPM2-/- were selectively ablated from the skin or the hypothalamus (these experiments are not necessary for this paper) it might be possible to conclude whether sensation or body temperature is more likely the root cause of these effects but, without further experiments it is tough to conclude either way. (iii) Because the ambient temperature is controlled in this behavior, another hypothesis is that warm avoidance could be due to negative valence associated with breathing warm air, i.e. a result of sensation within the body in internal pathways, rather than sensing from the external skin. Overall, the authors should tone down conclusions about sensation and present a more detailed discussion of these points.  

      We are sorry that the statement including the phrase “crucially temperature perception is not static” was ambiguous; We have now deleted this statement and instead included different possibilities as to why mice may switch from one chamber to the other stochastically. 

      As the referee mentioned, it is possible that some other variable (motivation etc.) makes the mouse change the chamber; Nevertheless, we hypothesize that this variable (whatever it might be) is still modulated by temperature (at least this would be the likeliest explanation that we see).

      As for the aspect of internal/hypothalamic temperature sensing and its dependence on Trpm2: we have included this possibility in the discussion in the manuscript. 

      As for the point of negative valence mediated by breathing in warm air: yes, presumably this could also be possible. The aspect of valence is in interesting aspect by itself: would the mice be rather repelled from the (uncomfortable) hot plate or more attracted to the (more comfortable) thermoneutral plate, or both? Something to elucidate in a different study.

      (5) It is an excellent idea to present a more in-depth analysis of the behavioral data collected during the preference task, beyond 'the mouse is on one side or the other'. However, the drift-diffusion approach is complex to interpret from the text in the results and the figures. The results text is not completely clear on which behavioral parameters are analyzed and terms like drift, noise, estimate, and evidence are not clearly defined. Currently, this section of the paper slightly confuses and takes the paper away from the central findings about dynamics and behavioral differences. It seems like they could come to similar conclusions with simpler analysis and simpler figures. 

      We have now reassessed the description of the drift diffusion model and explain it more clearly, this can be found on page 5 – 8. We have considered whether it will be better to introduce the drift diffusion model at the beginning of the study, subsequent to Figure 1 but we believe this to better suited at the end, because, indeed, the cellular results (and differences in kinetic response parameters observed in DRG cultures of Trpv1 KO mice) prompted us to assess the behavior in this way. Thus, the order of experiments presented here, represents also more the natural path the study took. 

      (6) In Figure 2D the % of warm-sensitive neurons are shown for each genotype. Each data point is a field of view, however, reading the figure legend there appear to be more FOVs than data points (eg 10 data points for the TRPV1-/- but 17 FOVs). The authors should check this. 

      We have checked and corrected the number of FOVs mentioned in the legend, and the number shown in the Figure 2D and its legend are now in agreement. 

      (7) Can the authors comment on why animals with over-expression of TRPV1 spend more time in the warmest chamber to start with at 38C and not at 34C?  

      This is an interesting observation that we did not consider before. A closer look at Figure 4H reveals that the majority of the TRPV1-OX animals, have a proportionally long first visit to the 38°C room. We can only speculate why this is the case. We cannot rule out that this a technical shortcoming of the assay and how we conduced it – but we did not observe this for the wildtype mice, thus it is rather unlikely a technical problem. It is possible that this is a type of “freezing-” (or “startle-“) behavior when the animals first encounter the 38°C temperature. Freezing behaviors in mice can be observed when sudden/threatening stimuli are applied. It is possible that, in the TRPV1-overexpressing animals, the initial encounter with 38°C leads to activation of a larger proportion of cells (compared to WT controls), possibly signaling a “threatening” stimulus, and thus leading to this startle effect. However, such a claim would require additional experiments to test such a hypothesis more rigorously.

    1. eLife Assessment

      This study presents valuable evidence concerning the potential for naturalistic movie-viewing fMRI experiments to reveal some features that are correlated with the functional and topographical organization of the developing visual system in awake infants and toddlers. The data are compelling given the difficulty of studying this population, the methodology is original and validated, and the evidence supporting the conclusions is convincing and in line with prior research using resting-state and awake task-based fMRI. This study will be of interest to cognitive neuroscientists and developmental psychologists, and in particular those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited tolerance to fMRI.

    2. Reviewer #2 (Public review):

      Summary:

      This manuscript reports analyses of fMRI data from infants and toddlers watching naturalistic movies. Visual areas in the infant brain show distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. The pattern of activity in visual regions contains some features predicted by the regions' retinotopic responses. The revised version of the manuscript provides additional validation of the methodology, and clarifies the claims. As a result, the data provide clear support for the claims.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. Using these data position the authors show that activity evoked by movies, in infants' visual areas, is correlated with the regions' retinopic response. The revised manuscript validates this methodology, using adult data. The revised manuscript also shows that an infant's movie watching data is not sufficient or optimal to predict their visual areas' retinotopic responses; anatomical alignment with a group of previous participants provides more accurate prediction of a new participant's retinotopic response.

      Weaknesses:

      A key step in the analysis of the movie-watching data is the selection of independent components of the movie evoked response that resemble retinotopic spatial patterns. While the trained researcher was unlikely to be biased by this infant's own retinotopy, he/she was actively looking for ICs that resemble average patterns of retinotopic response. To show that these ICs didn't arise by chance (i.e. in noise), the authors proposed an additional analysis in the revised manuscript, by misaligning the functional and anatomical data for a subset of participants. This only partially confirms the reliability of the original components, since when the (new) coder tried to be conservative to avoid false components, he/she identified just over half of the 'true' components (13 vs 22 estimated over the group of 6 infants).

    3. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands[12, 13, 24] and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25–27]. Movies have been useful in awake infant fMRI for studying event segmentation[28], functional alignment[29], and brain networks[30]. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas[28], but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27, 32–34].” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres[31].” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion[25-27].” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains[25,26,35,42,43].” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously[23] found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses[27], here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆<sub>Fisher Z</sub> M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run withinparticipant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies[45].” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participant specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown below. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro, & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      Enhancing the visualization

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4. meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45], which may prove especially useful for infant fMRI[52].” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI[1-6], one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks[7].”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults[41]; however, they are often not the primary driver of function[39]. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses[27,32-34].” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults[27,32,33], or revealing changing function over development[45].” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas[63] in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. eLife Assessment

      Using genomic data from ancient and modern samples, this important study investigates the genomic history of cattle in Iberia, focusing on the admixture between domestic cattle and their wild ancestors, aurochs. The authors present convincing evidence for interbreeding between domestic cattle and wild aurochs since the Neolithic period, although the evidence of sex-biased introgression is weak. The authors also show that the aurochs ancestry in cattle stabilized at ~20% since ~4000 years ago and continues into modern breeds; however, the aurochs ancestry is not heightened in a modern breed of Spanish fighting bulls that are bred for aggressiveness. The work will be of interest to evolutionary biologists and quantitative geneticists who seek to understand the genomic history and genetic basis of trait variation of domesticated animals.

    2. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors investigated the admixture history of domestic cattle since they were introduced into Iberia, by studying genomic data from 24 ancient samples dated to ~2000-8000 years ago and comparing them to modern breeds. They aimed to (1) characterize genomic variation of skeletal remains and concordance (or discordance) with morphological features; (2) test for hybridization between wild aurochs and domestic cattle; (3) test for correlation between genetic ancestry and stable isotope levels (which are indicative of ecological niche); and (4) test for previously hypothesized higher aurochs ancestry in a modern breed of fighting bulls.

      Strengths:

      Overall, this study collects valuable new data and tests several important hypotheses regarding the evolutionary history and genomic variation of domestic cattle in Iberia, such as admixture between domestic and wild populations, and correlation between genome-wide aurochs ancestry and aggressiveness.

      Weaknesses:

      Most conclusions are well supported by the data presented, with the strengths and caveats of each analysis clearly explained. The presence of admixed individuals in prehistorical periods strongly support hybridization between wild and domestic populations, although the evidence for sex-biased introgression and ecological niche sharing is relatively weak. Lastly, the authors presented convincing evidence for relatively constant aurochs ancestry across all modern breeds, including the Lidia breed that has been bred for aggressiveness for centuries.

      Major comments:

      As the authors pointed out, a major limitation of this study is uncertainty in the "population identity" for most sampled individuals (i.e., whether an individual belonged to the domesticated or wild herd when they were alive). Based on chronology, morphology and genetic data, it is clear the Mesolithic samples from the Artusia and Mendandia sites are bona fide aurochs, but the "population identities" of individuals from the other two sites are much less certain. Indeed, archeological and morphological evidence from El Portalon supports the presence of both domestic animals and wild aurochs, which is echoed by the inter-individual heterogeneity in genetic ancestry. Despite the strong evidence of hybridization, it is unclear whether these admixed individuals were raised in the domestic population or lived in the wild population and hunted, limiting the authors' ability to draw conclusions regarding the direction of gene flow.

      In general, detecting sex-bias admixture is an inherently challenging problem, especially given limited data. The differential ancestry proportions (estimated by f4 ratios) on autosomes and X chromosome are indicative of sex-biased hybridization and consistent with previous mtDNA results and other non-genetic data. However, as shown in Fig 3, the confidence intervals of X and autosomal estimates overlap for all but a couple of individuals, despite the overall trend of the point estimates. Moreover, even if there is significant difference, it only suggests existence of sex-bias but does not speak to the extent (unless further quantitative argument is made). Statements such as "it was mostly aurochs males who contributed wild ancestry to domestic herds" is too strong and may be interpreted as extreme bias. The authors did a good job noting the caveats of this analysis and down-toned the statement in the main text, but claims regarding sex-bias hybridization that use the phrase "mostly" in the abstract and discussion need to be further weakened.

      The stable isotope analysis is very under-powered, due to issues of categorization of wild vs domestic Bos, as discussed by the authors. Although the considerable overlap in stable isotope values between domestic and wild groups is consistent with shared ecological niche, but the absence of evidence (ie significant difference between groups) is not evidence of absence. Two alternative, non-mutually exclusive scenarios are (1) prevalent errors in classification of wild vs domestic individuals; (2) different ecological niches share similar isotope profiles. Thus, the claim "suggesting that wild and domesticated groups often did not occupy different niches in Iberia" is still too strong.

    3. Reviewer #3 (Public review):

      Summary:

      Günther and colleagues leverage ancient DNA data to track the genomic history of one of the most important farm animals (cattle) in Iberia, a region showing peculiarities both in terms of cultural practices as well as a climatic refugium during the LGM, the latter of which could have allowed the survival of endemic lineages. They document interesting trends of hybridisation with wild aurochs over the last 8-9 millennia, including a stabilisation of auroch ancestry ~4000 years ago, at ~20%, a time coincidental with the arrival of domestic horses from the Pontic steppe. Modern breeds such as the iconic Lidia used in bullfighting or bull running retain a comparable level of auroch ancestry.

      Strengths:

      The generation of ancient DNA data has been proven crucial to unravel the domestication history of traditional livestock, and this is challenging due to the environmental conditions of the Iberian peninsula, less favourable to DNA preservation. The authors leverage samples unearthed from key archaeological sites in Spain, including the karstic system of Atapuerca. Their results provide fresher insights into past management practices and permit characterisation of significant shifts in hybridization with wild aurochs.

      Comments on revisions:

      The authors have satisfactorily addressed my previous concerns. Last questions:

      - How many MCMC iterations were run for Structf4? Can they show the likelihood of the last 10% of MCMC iterations? The results seem way too different for K = 4 vs. K = 5, but only for moo014 and moo019.

      - I guess the authors also lack an "a" superindex in Table 1 for moo019.

      - That Gyu2-related ancestry appears systematically for K=5 suggests that the Caucasus-related ancestry was already present in the pool that led to domesticates. Is it not important to discuss the implications of this possibility, for future analyses?

      - If monophyletic, why choose between Bed3 and CPC98 if both could be combined as a single population to further reduce qpAdm and f4 confidence intervals?

      - Why not combine all auroch Iberian samples as a single population for testing gene flow from this whole group of samples to ancient Iberian cattle? Would be the resulting coverage still too low?

      - What is subindex 1 in the denominator of the f4 ratio (main methods)?

      Thanks for your efforts

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      My main concern is the use of the 700K SNP dataset. This set of SNPs suffers from a heavy ascertainment bias, which can be seen in the PCA in the supplementary material where all the aurochs cluster in the center within the variation of cattle. Given the coverage of some of the samples, multiple individuals would have less than 10K SNP covered. The majority of these are unlikely to be informative here given that they would just represent fixed positions between taurine and indicine or SNPs mostly variable in milk cattle breeds. The authors would get a much better resolution (i.e. many more SNPs to work with their very low genome coverage data) using the 1000 bull genome project VCF data set:

      https://www.ebi.ac.uk/ena/browser/view/PRJEB42783 which based on whole genome resequencing data from many cattle. This will certainly help with improving the resolution of qpAdm and f4 analysis, which have huge confidence intervals in most cases. Right now some individuals have huge confidence intervals ranging from 0 to 80% auroch ancestry...

      We thank the reviewer for this suggestion. We repeated our analyses with a SNP panel from Run 6 of the 1000 Bulls project presented in Naval-Sanchez et al 2020. This panel reduced standard errors and narrowed down confidence intervals for the ancient samples. Another consequence is that more single-source qpAdm models can now be rejected highlighting the abundance of hybridization. For our comparison to modern breeds, we still use the 700K dataset as it provides a set of different modern European cattle breeds.

      I agree with the authors that qpAdm is likely to give quite a noisy estimate of ancestry here (likely explain part of the issue I mentioned above). Although qpAdm is good for model testing here for ancestry proportion the authors instead could use an explicit f4 ratio - this would allow them to specify a model which would make the result easier to interpret.

      We have added ancestry estimates from f4 ratios to the manuscript and display them together with qpAdm and Struct-f4 (as suggested by reviewer #3) in our new Table 1. We decided to keep all three different estimates to illustrate that results are not consistent for all analyses. An additional feature of qpAdm is the possibility that two source models can be rejected and additional ancestries can be identified.

      The interpretation of the different levels of allele sharing on X vs autosome being the result of sex-bias admixture is not very convincing. Could these differences simply be due to a low recombination rate on the X chromosome and/or lower effective population size, which would lead to less efficient purifying selection?

      Following this comment (and another comment referring to the X chromosome analysis by reviewer #2), we decided to remove sex bias from the title of our study and add more information on the caveats of this analysis. While estimating ancestry on the X chromosome can be difficult, we also add that our patterns are consistent with what has been suggested based on ancient mitochondrial data (Verdugo et al 2019). For Neolithic Anatolia, it has been suggested that the insemination of domestic cows by auroch bulls has been intentional or even ritual (Peters et al 2012). A recent parallel archaeogenomic study also concluded sex-biased introgression from autosomal, X-chromosomal and Y-chromosomal data (Rossi et al 2024). As our results are consistent with these previous studies as well as the lower differentiation of modern breeds on the X chromosome (da Fonseca et al 2019), we still consider the general pattern of our results valid even if the exact extent of sex bias is difficult to assess.

      The authors suggest that 2 pop model rejection in some domestic population might be due to indicine ancestry, this seems relatively straightforward to test.

      We had already performed this analysis of modeling their ancestry from three sources using qpAdm. The results are shown in Supplementary Table S6 and we now refer to this more explicitly in the text: “The presence of indicine ancestry can be confirmed in a qpAdm analysis using three sources resulting in fitting models for all breeds (Supplementary Table S6).”

      The first sentence of the paper is a bit long-winded, also dogs were domesticated before the emergence of farming societies.

      We rephrased the first sentence to “Domestication of livestock and crops has been the dominant and most enduring innovation of the transition from a hunter-gathering lifestyle to farming societies.”

      It would be good to be specific about the number of genomes and coverage info in the last paragraph of the intro.

      This information is included in the first paragraph of the results section and we decided to not duplicate the numbers in the preceding introduction paragraph to retain a flow for the readers.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors investigated the admixture history of domestic cattle since they were introduced into Iberia, by studying genomic data from 24 ancient samples dated to ~2000-8000 years ago and comparing them to modern breeds. They aimed to (1) test for introgression from (local) wild aurochs into domestic cattle; (2) characterize the pattern of admixture (frequency, extent, sex bias, directionality) over time; (3) test for correlation between genetic ancestry and stable isotope levels (which are indicative of ecological niche); and (4) test for the hypothesized higher aurochs ancestry in a modern breed of fighting bulls.

      Strengths:

      Overall, this study collects valuable new data that are useful for testing interesting hypotheses, such as admixture between domestic and wild populations, and correlation between genome-wide aurochs ancestry and aggressiveness.

      Thank you for highlighting the importance of our study and the potential of our dataset.

      Weaknesses:

      Most conclusions are partially supported by the data presented. The presence of admixed individuals in prehistorical periods supports the hypothesized introgression, although this conclusion needs to be strengthened with an analysis of potential contamination. The frequency, sex-bias, and directionality of admixture remain highly uncertain due to limitations of the data or issues with the analysis. There is considerable overlap in stable isotope values between domestic and wild groups, indicating a shared ecological niche, but variation in classification criteria for domestic vs wild groups and in skeletal elements sampled for measurements significantly weakens this claim. Lastly, the authors presented convincing evidence for relatively constant aurochs ancestry across all modern breeds, including the Lidia breed which has been bred for aggressiveness for centuries. My specific concerns are outlined below.

      Contamination is a common concern for all ancient DNA studies. Contamination by modern samples is perhaps unlikely for this specific study of ancient cattle, but there is still the possibility of cross-sample contamination. The authors should estimate and report contamination estimates for each sample (based on coverage of autosomes and sex chromosomes, or heterozygosity of Y or MT DNA). Such contamination estimates are particularly important to support the presence of individuals with admixed ancestry, as a domestic sample contaminated with a wild sample (or vice versa) could appear as an admixed individual.

      We thank the reviewer for this suggestion. Due to our low coverage data, we focused on estimating contamination from the mitochondrial data by implementing the approach used by Green et al (2008). We make the code for this step available on Github. While most samples displayed low levels of contamination, we identified one sample (moo013a) with a surprisingly high (~50%) level of contamination which was excluded from further analysis.

      A major limitation of this study is uncertainty in the "population identity" for most sampled individuals (i.e., whether an individual belonged to the domesticated or wild herd when they were alive). Based on chronology, morphology, and genetic data, it is clear the Mesolithic samples from the Artusia and Mendandia sites are bona fide aurochs, but the identities of individuals from the other two sites are much less certain. Indeed, archeological and morphological evidence from El Portalon supports the presence of both domestic animals and wild aurochs, which is echoed by the inter-individual heterogeneity in genetic ancestry. Based on results shown in Fig 1C and Fig 2 it seems that individuals moo017, moo020, and possibly moo012a are likely wild aurochs that had been hunted and brought back to the site by humans. Although the presence of individuals (e.g., moo050, moo019) that can only be explained by two-source models strongly supports that interbreeding happened (if cross-contamination is ruled out), it is unclear whether these admixed individuals were raised in the domestic population or lived in the wild population and hunted.

      The reviewer is pointing out an important topic, the unknown identity of the studied individuals. We have revised the text making clear that we do not know whether the individuals were hunted or herded. At the same time, their genomic ancestry speaks for itself showing that there was hybridization between wild and domestic and that different individuals carried different degrees of wild ancestry. In the revised version, we have added the unknown identity as well as the fact that our results can be affected by both, changes in human hunting and herding practices over time. Regardless of the exact identity of the individuals, our results can still be seen as (a) evidence for hybridization and (b) changes in human practices (hunting and/or herding) and their relationship to bovids over time.

      Such uncertainty in "population identity" limits the authors' ability to make conclusions regarding the frequency, sex bias, and directionality of gene flow between domestic and wild populations. For instance, the wide range of ancestry estimates in Neolithic and Chalcolithic samples could be interpreted as evidence of (1) frequent recent gene flow or (2) mixed practices of herding and hunting and less frequent gene flow. Similarly, the statement about "bidirection introgression" (on pages 8 and 11) is not directly supported by data. As the genomic, morphological, and isotope data cannot confidently classify an individual as belonging to the domesticated or wild population, it seems impossible to conclude the direction of gene flow (if by "bidirection introgression" the authors mean something other than "bidirectional gene flow", they need to clearly explain this before reaching the conclusion.)

      We have removed “bidirectional introgression” from the text and replaced it with the more neutral term “hybridization”. Furthermore, we used the revision to mention at several places in the text that it is not clear whether the sequenced individuals were hunted and herded and that the observed pattern likely reflects changes in both hunting and herding practices.

      The f4 statistics shown in Fig 3B are insufficient to support the claim regarding sex-biased hybridization, as the f4 statistic values are not directly comparable between the X chromosome and autosomes. Because the effective population size is different for the X chromosome and autosomes (roughly 3:4 for populations with equal numbers of males and females), the expected amount of drift is different, hence the fraction of allele sharing (f4) is expected to be different. In fact, the observation that moo004 whose autosomal genome can be modeled as 100% domestic ancestry still shows a higher f4 value for the X chromosome than autosomes hints at this issue. A more robust metric to test for sex-biased admixture is the admixture proportion itself, which can be estimated by qpAdm or f4-ratio (see Patterson et al 2012). However, even with this method, criticism has been raised (e.g., Lazaridis and Reich 2017; Pfennig and Lachance, 2023). In general, detecting sex-bias admixture is a tough problem.

      In response to this comment and another comment by reviewer #1, we decided to remove sex bias from the title. In the revised version of our study, we have now switched this analysis from f4 statistics to comparing f4 ratios between the X chromosome and autosomes (Figure 3). Furthermore, we have added more information on the caveats of this analysis citing the articles mentioned by the reviewer. At the same time, we highlight that our patterns are consistent with what has been suggested based on ancient mitochondrial data (Verdugo et al 2019). Unfortunately, the low coverage data does not allow to call Y chromosomal haplotypes which would also allow an analysis of the paternal lineage. But our results are consistent with additional examples from the literature: For Neolithic Anatolia, it has been suggested that the insemination of domestic cows by auroch bulls has been intentional or even ritual (Peters et al 2012) and there is a lower differentiation of modern breeds on the X chromosome (da Fonseca et al 2019). A recent parallel archaeogenomic study also concluded sex-biased introgression from autosomal, X-chromosomal and Y-chromosomal data (Rossi et al 2024). Similar to the broader hybridization signal, our interpretation does not depend on the estimates for single individuals as we describe the broader pattern. As our results are consistent with previous results based on other types of data, we still consider the general pattern of our results valid even if the exact extent of sex bias is difficult to assess.

      In general, the stable isotope analysis seems to be very underpowered, due to the issues of variation in classification criteria and skeletal sampling location discussed by the authors in supplementary material. The authors claimed a significant difference in stable nitrogen isotope between (inconsistently defined) domestic cattle and wild aurochs, but no figures or statistics are presented to support this claim. Please describe the statistical method used and the corresponding p-values. The authors can consider including a figure to better show the stable isotope results.

      In combination with updated tables, we have added a supplementary figure showing the stable isotope results (S9). In light of the reanalysis of the genetic data, we have reassessed the genetic models used to assign species in the stable isotope analysis. We have provided more details of the statistical methods used and the p-values are given in the supplementary materials. There is a significant difference in the nitrogen isotope values when comparing B. taurus and B. primigenius (identified on morphology) but no other comparisons are significant at the p = 0.05 threshold. The reviewer highlights what we have mentioned in the supplementary material regarding the varied skeletal elements used for stable isotope analysis and the difficulty of assigning a species identity (as this depends on what criteria are used; morphological or some kind of genetic threshold of ancestry). Indeed, how to identify the species is at the heart of the paper. Given that identity could be defined in many ways, we have used 3 different genetic models to reflect this and the morphological categories, to help explore different possible scenarios. The reviewer is correct to point out that some of this analysis is not helped by the variety of skeletal elements used, but we have been careful not to over-interpret the results. The only samples that have nitrogen values higher than one standard deviation from the mean are domestic cattle, so it is not unreasonable to suggest that only domestic cattle have high nitrogen isotope values.

      Reviewer #3 (Public Review):

      Summary:

      Günther and colleagues leverage ancient DNA data to track the genomic history of one of the most important farm animals (cattle) in Iberia, a region showing peculiarities both in terms of cultural practices as well as a climatic refugium during the LGM, the latter of which could have allowed the survival of endemic lineages. They document interesting trends of hybridisation with wild aurochs over the last 8-9 millennia, including a stabilisation of auroch ancestry ~4000 years ago, at ~20%, a time coincidental with the arrival of domestic horses from the Pontic steppe. Modern breeds such as the iconic Lidia used in bullfighting or bull running retain a comparable level of auroch ancestry.

      Strengths:

      The generation of ancient DNA data has been proven crucial to unravel the domestication history of traditional livestock, and this is challenging due to the environmental conditions of the Iberian peninsula, less favourable to DNA preservation. The authors leverage samples unearthed from key archaeological sites in Spain, including the karstic system of Atapuerca. Their results provide fresher insights into past management practices, and permit characterisation of significant shifts in hybridization with wild aurochs.

      We thank the reviewer for their positive assessment of our work and for highlighting the strength and potential of the study.

      Weaknesses:

      - Treatment of post-mortem damage: the base quality of nucleotide transitions was recalibrated down to a quality score of 2, but for 5bp from the read termini only. In some specimens (e.g. moo022), the damage seems to extend further. Why not use dedicated tools (e.g. mapDamage), or check the robustness by conditioning on nucleotide transversions?

      We agree that using such a non-standard data preparation approach requires some testing. Since our main analyses are all based on f statistics, we compared f4 statistics and f4 ratios of our rescaled base quality data with data only using transversion sites. While estimates are highly correlated, the data set reduced to transversions produces larger confidence intervals in f4 ratios due to the lower number of sites. Consequently, we decided to use the rescaled data for all analyses displayed in main figures. We also prefer not to perform reference based rescaling as implemented in mapDamage as it might be sensitive to mapping bias (Günther & Nettelblad 2019).

      - Their more solid analyses are based on qpAdm, but rely on two single-sample donor populations. As the authors openly discuss, it is unclear whether CPC98 is a good proxy for Iberian aurochs despite possibly forming a monophyletic clade (the number of analysed sites is simply too low to assess this monophyly; Supplementary Table S2). Additionally, it is also unclear whether Sub1 was a fully unadmixed domestic specimen, depleted of auroch ancestry. The authors seem to suggest themselves that sex-biased introgression may have already taken place in Anatolia ("suggesting that sex-biased processes already took place prior to the arrival of cattle to Iberia").

      We expanded the discussion on this topic but removed the analysis of whether European aurochs form a clade due to the low number of sites. We do highlight that a recent parallel study on aurochs genomes confirmed that Western European aurochs form a clade, probably even originating from an Iberian glacial refugium (Rossi et al 2024). Even if minor structure in the gene pool of European aurochs might affect our quantitative results, it should not drive the qualitative pattern. The same should be the case for Sub1 as our tests would detect additional European aurochs ancestry that was not present in Sub1. The corresponding paragraph now reads:

      “A limitation of this analysis is the availability of genomes that can be used as representatives of the source populations as we used German and British aurochs to represent western European aurochs ancestry and a single Anatolian Neolithic to represent the original domestic cattle that was introduced into Europe. Our Mesolithic Iberian aurochs contained too little endogenous DNA to be used as a proxy aurochs reference and all Neolithic and Chalcolithic samples estimated with predominantly aurochs ancestry (including the 2.7x genome of moo014) already carry low (but significant) levels of domestic ancestry. However, the fact that all of these aurochs samples carried P mitochondria strongly suggests that western European aurochs can be considered monophyletic. Furthermore, a recent parallel study also concluded that Western European aurochs all form a clade (27). The Anatolian Sub1 might also not be depleted of any European aurochs ancestry and could not fully represent the original European Neolithic gene pool as also indicated by qpAdm and Struct-f4 identifying small proportions of other Asian ancestries in some Iberian individuals.

      While these caveats should affect our quantitative estimates of European aurochs ancestry, they should not drive the qualitative pattern as our tests would still detect any excess European aurochs ancestry that was not present in Neolithic Anatolia.”

      Alternatively, I recommend using Struct-f4 as it can model the ancestry of all individuals together based on their f4 permutations, including outgroups and modern data, and without the need to define pure "right" and "left" populations such as CPC98 and Sub1. It should work with low-coverage data, and allows us to do f4-based MDS plots as well as to estimate ancestry proportions (including from ghost populations).

      We thank the reviewer for this suggestion. We added Struct-f4 as an analysis but observed that it would not converge in an individual-based analysis due to the low coverage of most of our samples. We added Struct-f4 results for samples with >0.1X to the new Table 1, the results are similar to the results obtained using f4 ratios and (to a lower degree) the qpAdm results.

      - In the admixture graph analyses (supplementary results), the authors use population groups based on a single sample. If these samples are pseudohaploidised (or if coverage is insufficient to estimate heterozygosity - and it is at least for moo004 and moo014), f3 values are biased, implying that the fitted graph may be wrong. The graph shown in Fig S7 is in fact hard to interpret. For example, the auroch Gyu2 from Anatolia but not the auroch CPC98 also from Anatolia received 62% of ancestry from North Africa? The Neolithic samples moo004 and moo014 also show the same shocking disparity. I would consider re-doing this analysis with more than a sample per population group

      There seems to be some confusion relating to the sample identity in these figures. CPC98 is British and not Anatolian while Gyu2 is from the Caucasus and not Anatolia which would explain why they are different. Furthermore, moo004 is mostly of domestic ancestry while, moo014 is mostly of European aurochs ancestry according to our other analyses, which should explain why they also behave differently in this analysis. To avoid confusion and since this is a supplementary analysis from which we are not drawing any major conclusions, we decided to remove the graphs and the analysis from the study.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Fig 3A: The red regression line is misleading. It seems to show that the average aurochs ancestry fraction has been steadily decreasing since ~8000 years ago, but the "averaging" is not meaningful as not all samples necessarily represent domestic cattle remains and the sample size is rather small. In other words, the samples are just a small, random collection of domestic and wild animals, and the average ancestry is subject to large sampling noise. I would suggest removing the regression line (along with the associated confidence interval) in this figure. It would also be helpful to label the samples with their IDs and morphology in the plot for cross-reference with other figures. Also, it is said in the legend that "Modern Iberian breeds... are added around date 0 with some vertical jitter". Do the authors mean "horizontal jitter" instead?

      Thank you for noticing this! We have removed the regression line and corrected the figure legend.

      Fig 2 vs Fig 3A: are the error bars the same in these two plots? They seem to be highly similar, if not identical, but the legends read very differently ("95% confidence interval by block-jackknife vs. on standard error"). Please explain.

      The figure legends have been corrected.

      Fig 3B: What do the error bars in Fig 3B mean? 95% confidence interval or one standard error? Please clarify in the legend.

      We have removed this figure and replaced it with a different way of displaying the results (now Figure 3). We ensured that the error bars are displayed consistently across figures.

      According to the f4 statistics shown in Fig 1C and Fig 3B, moo012b carries a relatively high amount of domestic ancestry. How is this compatible with the observation in Fig 2 that this individual can be modeled with 100% aurochs (i.e., aurochs as the single source)? Does this simply reflect the low genome coverage?

      moo012b is indeed one of the lowest coverage samples in our has at <0.02x sequencing depth. Even in our revised analysis using more sites, there is a discrepancy between the results of f4 statistics and qpAdm (suggesting mostly domestic ancestry) and f4 ratio suggesting mostly aurochs ancestry (Figure 1C and Table 1). We believe that this highlights the sensitivity of different methods to assumptions about the relationships of sources and potential “outgroups” which might not be well resolvable with low coverage data and in the presence of potentially complex admixture. Our general results, however, do not depend on the estimates for single individuals as our interpretations are based on the general pattern.

      I don't fully understand the rationale behind the statement "However, at some point, the herding practices must have changed since modern Iberian breeds show approximately 20-25% aurochs ancestry". Can the stable ancestry fraction from 4000 years to the present (relative to the highly variable ancestry before) reflect of discontinuation of hunting rather than changes in herding practices?

      We agree that this statement was not justified here, we rephrased the sentence to “In fact, from the Bronze Age onwards, most estimates overlap with the approximately 25% aurochs ancestry in modern Iberian cattle” and generally tried to make the text more nuanced on the issue of herding and hunting practices.

      Reviewer #3 (Recommendations For The Authors):

      Thanks for this interesting piece of work. The results are clearly presented, and I have no additional concerns other than those reflected in the public report, except perhaps:

      (i) trying to use more informative sample names (eg. including the date and location). It may facilitate reading without going back and forth to the table "Sample List".

      We have now added a main table listing our post-Mesolithic samples together with their age, site and estimated aurochs ancestry proportions. We hope that his table makes it easier for readers to follow our sample IDs.

      (ii) Briefly describe in the main the age of aurochs and Sub1 not generated in this study.

      Fixed.

    1. eLife Assessment

      This study reveals the important role of upstream open reading frames (uORFs) in limiting the translational variability of downstream coding sequences. Through a combination of computational simulations, comparative analyses of translation efficiency across different developmental stages in two closely related Drosophila species, and manipulative, experimental validation of translation buffering by an uORF for a gene, the authors provide convincing evidence supporting their conclusions. This work will be of broad interest to molecular biologists and geneticists.

    2. Reviewer #1 (Public review):

      Summary:<br /> The authors set out to explore the role of upstream open reading frames (uORFs) in stabilizing protein levels during Drosophila development and evolution. By utilizing a modified ICIER model for ribosome translation simulations and conducting experimental validations in Drosophila species, the study investigates how uORFs buffer translational variability of downstream coding sequences. The findings reveal that uORFs significantly reduce translational variability, which contributes to gene expression stability across different biological contexts and evolutionary timeframes.

      Strengths:<br /> (1) The study introduces a sophisticated adaptation of the ICIER model, enabling detailed simulation of ribosomal traffic and its implications for translation efficiency.<br /> (2) The integration of computational predictions with empirical data through knockout experiments and translatome analysis in Drosophila provides a compelling validation of the model's predictions.<br /> (3) By demonstrating the evolutionary conservation of uORFs' buffering effects, the study provides insights that are likely applicable to a wide range of eukaryotes.

      Weaknesses:<br /> (1) Although the study is technically sound, it does not clearly articulate the mechanisms through which uORFs buffer translational variability. A clearer hypothesis detailing the potential molecular interactions or regulatory pathways by which uORFs influence translational stability would enhance the comprehension and impact of the findings.<br /> (2) The study could be further improved by a discussion regarding the evolutionary selection of uORFs. Specifically, it would be beneficial to explore whether uORFs are favored evolutionarily primarily for their role in reducing translation efficiency or for their capability to stabilize translation variability. Such a discussion would provide deeper insights into the evolutionary dynamics and functional significance of uORFs in genetic regulation.

    3. Reviewer #2 (Public review):

      uORFs, short open reading frames located in the 5' UTR, are pervasive in genomes. However, their roles in maintaining protein abundance are not clear. In this study, the authors propose that uORFs act as "molecular dam", limiting the fluctuation of the translation of downstream coding sequences. First, they performed in silico simulations using an improved ICIER model, and demonstrated that uORF translation reduces CDS translational variability, with buffering capacity increasing in proportion to uORF efficiency, length, and number. Next, they analzed the translatome between two related Drosophila species, revealing that genes with uORFs exhibit smaller fluctuations in translation between the two species and across different developmental stages within the same specify. Moreover, they identified that bicoid, a critical gene for Drosophila development, contains a uORF with substantial changes in translation efficiency. Deleting this uORF in Drosophila melanogaster significantly affected its gene expression, hatching rates, and survival under stress condition. Lastly, by leveraging public Ribo-seq data, the authors showed that the buffering effect of uORFs is also evident between primates and within human populations. Collectively, the study advances our understanding of how uORFs regulate the translation of downstream coding sequences at the genome-wide scale, as well as during development and evolution.

      The conclusions of this paper are mostly well supported by data, but some definitions and data analysis need to be clarified and extended.

      (1) There are two definitions of translation efficiency (TE) in the manuscript: one refers to the number of 80S ribosomes that complete translation at the stop codon of a CDS within a given time interval, while the other is calculated based on Ribo-seq and mRNA-seq data (as described on Page 7, line 209). To avoid potential misunderstandings, please use distinct terms to differentiate these two definitions.

      (2) Page 7, line 209: "The translational efficiencies (TEs) of the conserved uORFs were highly correlated between the two species across all developmental stages and tissues examined, with Spearman correlation coefficients ranging from 0.478 to 0.573 (Fig. 2A)." However, the authors did not analyze the correlation of translation efficiency of conserved CDSs between the two species, and compare this correlation to the correlation between the TEs of CDSs. These analyzes will further support the authors conclusion regarding the role of conserved uORFs in translation regulation.

      (3) Page 8, line 217: "Among genes with multiple uORFs, one uORF generally emerged as dominant, displaying a higher TE than the others within the same gene (Fig. 2C)." The basis for determining dominance among uORFs is not explained and this lack of clarification undermines the interpretation of these findings.

      (4) According to the simulation, the translation of uORFs should exhibit greater variability than that of CDSs. However, the authors observed significantly fewer uORFs with significant TE changes compared to CDSs. This discrepancy may be due to lower sequencing depth resulting in fewer reads mapped to uORFs. Therefore, the authors may compare this variability specifically among highly expressed genes.

      (5) If possible, the author may need to use antibodies against bicoid to test the effect of ATG deletion on bicoid expression, particularly under different developmental stages or growth conditions. According to the authors' conclusions, the deletion mutant should exhibit greater variability in bicoid protein abundance. This experiment could provide strong support for the proposed mechanisms.

    1. eLife Assessment

      This important study of regulatory elements and gene expression in the craniofacial region of the fat-tailed dunnart shows that, compared to placental mammals, marsupial craniofacial tissue develops in a precocious manner, with enhancer regulatory elements as primary driver of this difference. While the results are overall solid, addressing concerns regarding the liftover methods in the context of low conservation of alignable enhancers between dunnart and mouse would benefit the work, enhancing its value for uncovering mechanisms that drive heterochronic processes and as a reference for future mammalian evolution studies.

    2. Reviewer #1 (Public review):

      Summary:

      Cook et al. have presented an important study on the transcriptomic and epigenomic signature underlying craniofacial development in marsupials. Given the lack of a dunnart genome, the authors also prepared long and short-read sequence datasets to assemble and annotate a novel genome to allow for the mapping of RNAseq and ChIPseq data against H3K4me3 and H3K27ac, which allowed for the identification of putative promoter and enhancer sites in dunnart. They found that genes proximal to these regulatory loci were enriched for functions related to bone, skin, muscle and embryonic development, highlighting the precocious state of newborn dunnart facial tissue. When compared with mouse, the authors found a much higher proportion of promoter regions aligned between species than for enhancer regions, and subsequent profiling identified regulatory elements conserved across species and are important for mammalian craniofacial development. In contrast, the identification of dunnart-specific enhancers and patterns of RNA expression further confirm the precocious state of muscle development, as well as for sensory system development, in dunnart suggesting that early formation of these features are critical for neonate marsupials likely to assist with detecting and responding to cues that direct the joeys to the mother's teat after birth. This is one of the few epigenomic studies performed in marsupials (of any organ) and the first performed in fat-tailed dunnart (also of any organ). Marsupials are emerging as an important model for studying mammalian development and evolution and the authors have performed a novel and thorough analysis, impressively including the assembly of a new marsupial reference genome that will benefit many future studies.

      Strengths:

      The study provides multiple pieces of evidence supporting the important role enhancer elements play in mammalian phenotypic evolution, namely the finding of a lower proportion of peaks present in both dunnart and mouse for enhancers than for promoters, and dunnart showing more genes uniquely associated with it's active enhancers than any other combination of mouse and dunnart samples, whereas this pattern was less pronounced than for promoter-associated genes. In addition, rigorous parameters were used for the cross-species analyses to identify the conserved regulatory elements and the dunnart-specific enhancers. For example, for the results presented in Figure 1, I agree that it is a little surprising that the average promoter-TSS distance is greater than that for enhancers, but that this could be related to the possible presence of unannotated transcripts between genes. The authors addressed this well by examining the distribution of promoter-TSS distances and using proximal promoters (cluster #1) as high confidence promoters for downstream analyses.

      The genome assembly method was thorough, using two different long read methods (Pacbio and ONT) to generate the long reads for contig and scaffold construction, increasing the quality of the final assembled genome.

      Weaknesses:

      Biological replicates of facial tissue were collected at a single developmental time point of the fat-tailed dunnart within the first postnatal day (P0), and analysed this in the context of similar mouse facial samples from the ENCODE consortium at six developmental time points, where previous work from the authors have shown that the younger mouse samples (E11.5-12.5) approximately corresponds to the dunnart developmental stage (Cook et al. 2021). However, it would be useful to have samples from at least one older dunnart time point, for example, at a developmental stage equivalent to mouse E15.5. This would provide additional insight into the extent of accelerated face development in dunnart relative to mouse, i.e. how long do the regulatory elements that activated early in dunnart remain active for and does their function later influence other aspects of craniofacial development?

      The authors refer to the development of the CNS being delayed in marsupials relative to placental mammals, however, evidence shows how development of the dunnart brain (whole brain or cortex) is protracted compared to mouse, by a factor of at least 2 times, rather than delayed per se (Workman et al. 2013; Paolino et al. 2023). In addition, there is evidence that cortical formation and cell birth may begin at approximately the same stage across species equivalent to the neonate period in dunnart (E10.5 in mouse), and that shortly after this at the stage equivalent to mouse E12.5, the dunnart cortex shows signs of advanced neurogenesis followed by a protracted phase of neuronal maturation (Paolino et al. 2023). Therefore, it is possible that marsupial CNS development appears delayed relative to mouse but instead begins at the same stage and then proceeds to develop on a different timing scale.

    3. Reviewer #2 (Public review):

      This study by Cook and colleagues utilizes genomic techniques to examine gene regulation in the craniofacial region of the fat-tailed dunnart at perinatal stages. Their goal is to understand how accelerated craniofacial development is achieved in marsupials compared to placental mammals.

      The authors employ state-of-the-art genomic techniques, including ChIP-seq, transcriptomics, and high-quality genome assembly, to explore how accelerated craniofacial development is achieved in marsupials compared to placental mammals. This work addresses an important biological question and contributes a valuable dataset to the field of comparative developmental biology. The study represents a commendable effort to expand our understanding of marsupial development, a group often underrepresented in genomic studies.

      The dunnart's unique biology, characterized by a short gestation and rapid craniofacial development, provides a powerful model for examining developmental timing and gene regulation. The authors successfully identified putative regulatory elements in dunnart facial tissue and linked them to genes involved in key developmental processes such as muscle, skin, bone, and blood formation. Comparative analyses between dunnart and mouse chromatin landscapes suggest intriguing differences in deployment of regulatory elements and gene expression patterns.

      Strengths

      (1) The authors employ a broad range of cutting-edge genomic tools to tackle a challenging model organism. The data generated - particularly ChIP-seq and RNA-seq from craniofacial tissue - are a valuable resource for the community, which can be employed for comparative studies. The use of multiple histone marks in the ChIP-seq experiments also adds to the utility of the datasets.

      (2) Marsupial occupy an important phylogenetic position, but they remain an understudied group. By focusing on the dunnart, this study addresses a significant gap in our understanding of mammalian development and evolution. Obtaining enough biological specimens for these experiments studies was likely a big challenge that the authors were able to overcome.

      (3) The comparison of enhancer landscapes and transcriptomes between dunnarts and can serve as the basis of subsequent studies that will examine the mechanisms of developmental timing shifts. The authors also carried out liftover analyses to identify orthologous enhancers and promoters in mice and dunnart.

      Weaknesses and Recommendations

      (1) The absence of genome browser tracks for ChIP-seq data makes it difficult to assess the quality of the datasets, including peak resolution and signal-to-noise ratios. Including browser tracks would significantly strengthen the paper by provide further support for adequate data quality.

      (2) The first two figures of the paper heavily rely in gene orthology analysis, motif enrichment, etc, to describe the genomic data generated from the dunnart. The main point of these figures is to demonstrate that the authors are capturing the epigenetic signature of the craniofacial region, but this is not clearly supported in the results. The manuscript should directly state what these analyses aim to accomplish - and provide statistical tests that strengthen confidence on the quality of the datasets.

      (3) The observation that "promoters are located on average 106 kb from the nearest TSS" raises significant concerns about the quality of the ChIP-seq data and/or genome annotation. The results and supplemental information suggest a combination of factors, including unannotated transcripts and enhancer-associated H3K4me3 peaks - but this issue is not fully resolved in the manuscript. The authors should confirm that this is not caused by spurious peaks in the CHIP-seq analysis - and possibly improve genome annotation with the transcriptomic datasets presented on the study.

      (4) The comparison of gene regulation between a single dunnart stage (P1) and multiple mouse stages lacks proper benchmarking. Morphological and gene expression comparisons should be integrated to identify equivalent developmental stages. This "alignment" is essential for interpreting observed differences as true heterochrony rather than intrinsic regulatory differences.

      (5) The low conservation of putative enhancers between mouse and dunnart (0.74-6.77%) is surprising given previous reports of higher tissue-specific enhancer conservation across mammals. The authors should address whether this low conservation reflects genuine biological divergence or methodological artifacts (e.g., peak-calling parameters or genome quality). Comparisons with published studies could contextualize these findings.

      (6) Focusing only on genes associated with shared enhancers excludes potentially relevant genes without clear regulatory conservation. A broader analysis incorporating all orthologous genes may reveal additional insights into craniofacial heterochrony.

      In conclusion, this study provides an important dataset for understanding marsupial craniofacial development and highlights the potential of genomic approaches in non-traditional model organisms. However, methodological limitations, including incomplete genome annotation and lack of developmental benchmarking weaken the robustness and of the findings. Addressing these issues would significantly enhance the study's utility to the field and its ability to support the study's central conclusion that dunnart-specific enhancers drive accelerated craniofacial development.

    1. eLife Assessment

      This study provides important insights into how a specific brain region controls innate responses to odors, showing that different parts of this region govern behaviors related to attraction and aversion. The findings are convincing and supported by a combination of well-executed experimental approaches, including genetic manipulations and neural activity mapping, though the evidence could be strengthened by addressing certain methodological concerns, such as clarifying the rationale for specific experimental choices and exploring alternative techniques.

    2. Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of functional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      There are some weaknesses in the study's methods and interpretation. The lack of clarity regarding the behavior of the mice during head-fixed imaging experiments raises the possibility that restricted behavior could explain the absence of valence encoding at the population level. Furthermore, while the authors employ chemogenetic inhibition of specific pathways, the rationale for this choice over optogenetic inhibition is not fully addressed, and this could potentially affect the interpretation of the results. Additionally, the choice of the mplCoA for manipulation, rather than the more directly implicated anterior and posterior subregions, is not well-explained, which could undermine the conclusions drawn about the topographic organization of plCoA.

      Despite these concerns, the work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      -For a first submission the manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.<br /> -The authors should be commended for their rigorous anatomical characterizations and post-hoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.<br /> -The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthen the results.<br /> -The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      -Interpretation of calcium imaging data is very limited and requires additional analysis and behavioral responses specific to odors should be considered. If there are neural responses behavioral epochs and responses to those neuronal responses should be displayed and analyzed.<br /> -The effect of odor habituation is not considered.<br /> -Optogenetic data in the two subregions relies on very careful viral spread and fiber placement. The current anatomy results provided should be clear about the spread of virus in A-P, and D-V axis, providing coordinates for this, to ensure readers the specificity of each sub-zone is real.<br /> -The choice of behavioral assays across the two regions doesn't seem balanced and would benefit from more congruency,<br /> -Rationale for some of the choices of photo-stimulation experiment parameters isn't well defined.

    4. Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      There are a couple of inconsistencies that may be addressed by additional experiments and careful interpretation of the data.

      Stimulating aplCoA or slc17a6 neurons results in spatial avoidance, and stimulating pplCoA or slc17a7 neurons drives approach behaviors. On the other hand, the authors and others in the field also show that there is no apparent spatial bias in odor-driven responses associated with odor valence. This discrepancy may be addressed better. A possibility is that odor-evoked responses are recorded from populations outside of those defined by slc17a6/a7. This may be addressed by marking activated cells and identifying their molecular markers. A second possibility is that optogenetic stimulation activates a broad set of neurons that and does not recapitulate the sparseness of odor responses. It is not known whether sparsely activation by optogenetic stimulation can still drive approach of avoidance behaviors.

      The authors show that inhibiting slc17a7 neurons blocks approaching behaviors toward 2-PE. Consistent with this result, inhibiting NAc projection neurons also inhibits approach responses. However, inhibiting aplCOA or slc17a6 neurons does not reduce aversive response to TMT, but blocking MEA projection neurons does. The latter two pieces of evidence are not consistent with each other. One possibility is that the MEA projecting neurons may not be expressing slc17a6. It is not clear that the retrogradely labeling experiments what percentage of MEA- and NAC-projecting neurons express slc17a6 and slc17a7. It is possible that neurons expressing neither VGluT1 nor VGluT2 could drive aversive or appetitive responses. This possibility may also explain that silencing slc17a6 neurons does not block avoidance.

    1. eLife Assessment

      This important manuscript presents a thorough analysis of trans-specific polymorphism (TSP) in Major Histocompatibility Complex gene families across primates. The analysis makes the most of currently available genomic data and methods to substantially increase the amount and evolutionary time that TSPs can be observed, but both false negative TSPs due to missing genes at the assembly and/or annotation level, as well as false positives due to read mismapping with missing paralogs, could be assessed and discussed more. Overall the evidence provided is convincing, and the manuscript may benefit from discussing the future use of more complete assemblies made from long reads to reduce the occurrence of both missing and false TSPs.

    2. Reviewer #1 (Public review):

      Summary:

      MHC (Major Histocompatibility Complex) genes have long been mentioned as cases of trans-species polymorphism (TSP), where alleles might have their most recent common ancestor with alleles in a different species, rather than other alleles in the same species (e.g., a human MHC allele might coalesce with a chimp MHC allele, more recently than the two coalesce with other alleles in either species). This paper provides a more complete estimate of the extent and ages of TSP in primate MHC loci. The data clearly support deep TSP linking alleles in humans to (in some cases) old world monkeys, but the amount of TSP varies between loci.

      Strengths:

      The authors use publicly available datasets to build phylogenetic trees of MHC alleles and loci. From these trees they are able to estimate whether there is compelling support for Trans-species polymorphisms (TSPs) using Bayes Factor tests comparing different alternative hypotheses for tree shape. The phylogenetic methods are state-of-the-art and appropriate to the task.

      The authors supplement their analyses of TSP with estimates of selection (e.g., dN/dS ratios) on motifs within the MHC protein. They confirm what one would suspect: classical MHC genes exhibit stronger selection at amino acid residues that are part of the peptide binding region, and non-classical MHC exhibit less evidence of selection. The selected sites are associated with various diseases in GWAS studies.

      Weaknesses:

      An implication drawn from this paper (and previous literature) is that MHC has atypically high rates of TSP. However, rates of TSP are not estimated for other genes or gene families, so readers have no basis of comparison. No framework to know whether the depth and frequency of TSP is unusual for MHC family genes, relative to other random genes in the genome, or immune genes in particular. I expect (from previous work on the topic), that MHC is indeed exceptional in this regard, but some direct comparison would provide greater confidence in this conclusion.

      Given the companion paper's evidence of genic gain/loss, it seems like there is a real risk that the present study under-estimates TSP, if cases of TSP have been obscured by the loss of the TSP-carrying gene paralog from some lineages needed to detect the TSP. Are the present analyses simply calculating rates of TSP of observed alleles, or are you able to infer TSP rates conditional on rates of gene gain/loss?

      Figure 5 (and 6) provide regression model fits (red lines in panel C) relating evolutionary rates (y axis not labeled) to site distance from the peptide binding groove, on the protein product. This is a nice result. I wonder, however, whether a linear model (as opposed to non-linear) is the most biologically reasonable choice, and whether non-linear functions have been evaluated. The authors might consider generalized additive models (GAMs) as an alternative that relaxes linearity assumptions.

      The connection between rapidly evolving sites, and disease associations (lines 382-3) is very interesting. However, this is not being presented as a statistical test of association. The authors note that fast-evolving amino acids all have at least one association: but is this really more disease-association than a random amino acid in the MHC? Or, a randomly chosen polymorphic amino acid in MHC? A statistical test confirming an excess of disease associations would strengthen this claim.

    3. Reviewer #2 (Public review):

      Summary

      In this study, the authors characterized population genetic variation in the MHC locus across primates and looked for signals of long-term balancing selection (specifically trans-species polymorphism, TSP) in this highly polymorphic region. To carry out these tasks, they used Bayesian methods for phylogenetic inference (i.e. BEAST2) and applied a new Bayesian test to quantify evidence supporting monophyly vs. transspecies polymorphism for each exon across different species pairs. Their results, although mostly confirmatory, represent the most comprehensive analyses of primate MHC evolution to date and novel findings or possible discrepancies are clearly pointed out. However, as the authors discuss, the available data are insufficient to fully capture primates' MHC evolution.

      Strengths of the paper include: using appropriate methods and statistically rigorous analyses; very clear figures and detailed description of the results methods that make it easy to follow despite the complexity of the region and approach; a clever test for TSP that is then complemented by positive selection tests and the protein structures for a quite comprehensive study.

      That said, weaknesses include: lack of information about how many sequences are included and whether uneven sampling across taxa might results in some comparisons without evidence for TSP; frequent reference to the companion paper instead of summarizing (at least some of) the critical relevant information (e.g., how was orthology inferred?); no mention of the quality of sequences in the database and whether there is still potential effects of mismapping or copy number variation affecting the sequence comparison.

    4. Reviewer #3 (Public review):

      Summary

      The study uses publicly available sequences of classical and non-classical genes from a number of primate species to assess the extent and depth of TSP across the primate phylogeny. The analyses were carried out in a coherent and, in my opinion, robust inferential framework and provided evidence for ancient (even > 30 million years) TSP at several classical class I and class II genes. The authors also characterise evolutionary rates at individual codons, map these rates onto MHC protein structures, and find that the fastest evolving codons are extremely enriched for autoimmune and infectious disease associations.

      Strengths

      The study is comprehensive, relying on a large data set, state-of-the-art phylogenetic analyses and elegant tests of TSP. The results are not entirely novel, but a synthesis and re-analysis of previous findings is extremely valuable and timely.

      Weaknesses

      I've identified weaknesses in several areas (details follow in the next section):<br /> - Inadequate description and presentation of the data used<br /> - Large parts of the results read like extended figure captions, which breaks the flow.<br /> - Older literature on the subject is duly cited, but the authors don't really discuss their findings in the context of this literature.<br /> - The potential impact of mechanisms other than long-term maintenance of allelic lineages by balancing selection, such as interspecific introgression and incorrect orthology assessment, needs to be discussed.

    1. eLife Assessment

      This valuable study presents a resource for researchers using Drosophila to study neural circuits, in the form of a collection of split-Gal4 lines with an online search engine, which will facilitate the mapping of neuronal circuits. The evidence is convincing to demonstrate the utility of these new tools, and of the search engine, for understanding expression patterns in adults and larvae, and differences between the sexes. These resources will be of broad interest to Drosophila researchers in the field of neurobiology.

    2. Reviewer #1 (Public review):

      Summary:

      Meissner et al describe an update on the collection of split-GAL4 lines generated by a consortium led by Janelia Research Campus. This follows the same experimental pipeline described before and presents as a significant increment to the present collection. This will strengthen the usefulness and relevance of "splits" as a standard tool for labs that already use this tool and attract more labs and researchers to use it.

      Strengths:

      This manuscript presents a solid step to establish Split-GAL4 lines as a relevant tool in the powerful Drosophila toolkit. Not only the raw number of available lines contribute to the relevance of this tool in the "technical landscape" of genetic tools, but additional features of this effort contribute to the successful adoption. These include:

      (1) A description of expression patterns in the adult and larvae, expanding the "audience" for these tools<br /> (2) A classification of line combination according to quality levels, which provides a relevant criterion while deciding to use a particular set of "splits".<br /> (3) Discrimination between male and female expression patterns, providing hints regarding the potential role of these gender-specific circuits.<br /> (4) The search engine seems to be user-friendly, facilitating the retrieval of useful information.<br /> (5) An acknowledgement of the caveats and challenges that splits (like any other genetic tool) can carry.<br /> Overall, the authors employed a pipeline that maximizes the potential of the Split-GAL4 collection to the scientific community.

      Weaknesses:

      My concerns were resolved regarding the existence of caveats while using these tools that researchers should be aware of, particularly those using them for the first time.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript describes the creation and curation of a collection of genetic driver lines that specifically label small numbers of neurons, often just a single to handful of cell types, in the central nervous system of the fruit fly, Drosophila melanogaster. The authors screened over 77,000 split hemidriver combinations to yield a collection of 3060 lines targeting a range of cell types in the adult Drosophila central nervous system and 1373 lines characterized in third-instar larvae. These genetic driver lines have already contributed to several important publications and will no doubt continue to do so. It is a truly valuable resource that represents the cooperation of several labs throughout the Drosophila community.

      Strengths:

      The authors have thoughtfully curated and documented the lines that they have created, so that they may be maximally useful to the greater community. This documentation includes confocal images of neurons labeled by each driver line and when possible, a list of cell types labeled by the genetic driver line and their identity in an EM connectome dataset. The authors have also made available some information from the other lines they created and tested but deemed not specific or strong enough to be included as part of the collection. This additional resource will be a valuable aid for those seeking to label cell types that may not be included in the main collection.

      The added revisions help to clarify important points relating to the creation of the lines, which lines were included as part of this specific collection, and caveats to be mindful of when using any of the described lines. These revisions will increase the manuscript's utility to users who may be less familiar with this resource.

      Weaknesses:

      The major weakness, which is also in some ways a strength, is the stringent requirement that lines that be included be highly specific across the CNS. As a result, the lines that are part of this specific collection are sparse and specific but also limited in which cell types they cover. Doubtless there are many missing cell types.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Meissner et al describe an update on the collection of split-GAL4 lines generated by a consortium led by Janelia Research Campus. This follows the same experimental pipeline described before and presents as a significant increment to the present collection. This will strengthen the usefulness and relevance of "splits" as a standard tool for labs that already use this tool and attract more labs and researchers to use it.

      Strengths:

      This manuscript presents a solid step to establish Split-GAL4 lines as a relevant tool in the powerful Drosophila toolkit. Not only does the raw number of available lines contribute to the relevance of this tool in the "technical landscape" of genetic tools, but additional features of this effort contribute to the successful adoption. These include:

      (1) A description of expression patterns in the adult and larvae, expanding the "audience" for these tools

      (2) A classification of line combination according to quality levels, which provides a relevant criterion while deciding to use a particular set of "splits".

      (3) Discrimination between male and female expression patterns, providing hints regarding the potential role of these gender-specific circuits.

      (4) The search engine seems to be user-friendly, facilitating the retrieval of useful information.

      Overall, the authors employed a pipeline that maximizes the potential of the Split-GAL4 collection to the scientific community.

      Weaknesses:

      The following aspects apply:

      The use of split-GAL4 lines has improved tremendously the genetic toolkit of Drosophila and this manuscript is another step forward in establishing this tool in the genetic repertoire that laboratories use. Thus, this would be a perfect opportunity for the authors to review the current status of this tool, addressing its caveats and how to effectively implement it into the experimental pipeline.

      (1) While the authors do bring up a series of relevant caveats that the community should be aware of while using split-GAL4 lines, the authors should take the opportunity to address some of the genetic issues that frequently arise while using the described genetic tools. This is particularly important for laboratories that lack the experience using split-GAL4 lines and wish to use them. Some of these issues are covertly brought up, but not entirely clarified.

      First, why do the authors (wisely) rescreen the lines using UAS-CsChrimson-mVenus? One reason is that using another transgene (such as UAS-GFP) and/or another genomic locus can drive a different expression pattern or intensities. Although this is discussed, this should be made more explicit and the readers should be aware of this.

      Second, it would be important to include a discussion regarding the potential of hemidriver lines to suffer from transvection effects whenever there is a genetic element in the same locus. These are serious issues that prevent a more reliable use of split-GAL4 lines that, once again, should be discussed.

      We added additional explanatory text to the discussion.

      (2) The authors simply mention that the goal of the manuscript is to "summarize the results obtained over the past decade.". A better explanation would be welcomed in order to understand the need of a dedicated manuscript to announce the availability of a new batch of lines when previous publications already described the Split-GAL4 lines. At the extreme, one might question why we need a manuscript for this when a simple footnote on Janelia's website would suffice.

      We added an additional mention of the cell type split-GAL4 collection at the relevant section and added more emphasis on the curation process adding value to the final selections. We feel that the manuscript is useful to document the methods used for the contained analysis and datasets and gives a starting point to the reader to go through the many split-GAL4 publications and images.

      Reviewer #2 (Public Review):

      Summary: This manuscript describes the creation and curation of a collection of genetic driver lines that specifically label small numbers of neurons, often just a single to handful of cell types, in the central nervous system of the fruit fly, Drosophila melanogaster. The authors screened over 77,000 split hemidriver combinations to yield a collection of 3060 lines targeting a range of cell types in the adult Drosophila central nervous system and 1373 lines characterized in third-instar larvae. These genetic driver lines have already contributed to several important publications and will no doubt continue to do so. It is a truly valuable resource that represents the cooperation of several labs throughout the Drosophila community.

      Strengths:

      The authors have thoughtfully curated and documented the lines that they have created, so that they may be maximally useful to the greater community. This documentation includes confocal images of neurons labeled by each driver line and when possible, a list of cell types labeled by the genetic driver line and their identity in an EM connectome dataset. The authors have also made available some information from the other lines they created and tested but deemed not specific or strong enough to be included as part of the collection. This additional resource will be a valuable aid for those seeking to label cell types that may not be included in the main collection.

      Weaknesses:

      None, this is a valuable set of tools that took many years of effort by several labs. This collection will continue to facilitate important science for years to come.

      We thank the reviewer for their positive feedback.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Meissner et al. describes a collection of 3060 Drosophila lines that can be used to genetically target very small numbers of brain cells. The collection is the product of over a decade of work by the FlyLight Project Team at the Janelia Research Campus and their collaborators. This painstaking work has used the intersectional split-Gal4 method to combine pairs of so-called hemidrivers into driver lines capable of highly refined expression, often targeting single cell types. Roughly one-third of the lines have been described and characterized in previous publications and others will be described in manuscripts still in preparation. They are brought together here with many new lines to form one high-quality collection of lines with exceptional selectivity of expression. As detailed in the manuscript, all of the lines described have been made publicly available accompanied by an online database of images and metadata that allow researchers to identify lines containing neurons of interest to them. Collectively, the lines include neurons in most regions of both the adult and larval nervous systems, and the imaging database is intended to eventually permit anatomical searching that can match cell types targeted by the lines to those identified at the EM level in emerging connectomes. In addition, the manuscript introduces a second, freely accessible database of raw imaging data for many lower quality, but still potentially useful, split-Gal4 driver lines made by the FlyLight Project Team.

      Strengths:

      Both the stock collection and the image databases are substantial and important resources that will be of obvious interest to neuroscientists conducting research in Drosophila. Although many researchers will already be aware of the basic resources generated at Janelia, the comprehensive description provided in this manuscript represents a useful summary of past and recent accomplishments of the FlyLight Team and their collaborators and will be very valuable to newcomers in the field. In addition, the new lines being made available and the effort to collect all lines that have been generated that have highly specific expression patterns is very useful to all.

      Weaknesses:

      The collection of lines presented here is obviously somewhat redundant in including lines from previously published collections. Potentially confusing is the fact that previously published split-Gal4 collections have also touted lines with highly selective expression, but only a fraction of those lines have been chosen for inclusion in the present manuscript. For example, the collection of Shuai et al. (2023) describes some 800 new lines, many with specificity for neurons with connectivity to the mushroom body, but only 168 of these lines were selected for inclusion here. This is presumably because of the more stringent criteria applied in selecting the lines described in this manuscript, but it would be useful to spell this out and explain what makes this collection different from those previously published (and those forthcoming).

      We added more description of how this collection is focused on the best cell-type-specific lines across the CNS. An important requirement for inclusion was this degree of specificity across the CNS, while many prior publications had a greater emphasis on lines with a narrower focus of specificity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Luckily for us, genetics is for the most part an exact science. However, there's still some "voodoo" in a lot of genetic combinations that the authors should disclose and be as clear as possible in the manuscript. This allows for the potential users to gauge expectations and devise a priori alternative plans.

      We attempted to comprehensively cover the caveats inherent in our genetic targeting approach.

      Minor points:

      (1) The authors mention that fly age should be controlled as expression can vary. Is there any reference to support this claim?

      We added a reference describing driver expression changes over development.

      (2) There should be a citation for "Flies were typically 1-5 days old at dissection for the cell type collection rescreening, 1-8 days old for other non-MCFO crosses and 3-8 days old for MCFO".

      We clarified that these descriptions were of our experimental preparations, not describing other citable work.

      Reviewer #3 (Recommendations For The Authors):

      General Points:

      Overall, the manuscript is very clear, but there are a couple of points where more explicit information would be useful. One of these is with respect to the issue of selectivity of targeting. The cell type specificity of lines is often referred to, but cell types can range from single pairs of neurons to hundreds of indistinguishable neurons with similar morphology and function. It would be useful if the authors explained whether their use of the term "cell type" distinguishes cell type from cell number. It would also be useful if lines that target many neurons of a single cell type were identified.

      We added further discussion of cell types vs. cell numbers. Our labeling strategy was not optimized for counting cell numbers labeled by each line. We believe EM studies are best positioned to comprehensively evaluate the number of cells making up each type.

      The second point relates to vagueness about the intended schedule for providing resources that will match (or allow matching of) neurons to the connectome. For example, on pp. 5-6 it is stated that: "In the future all of the neurons in these lines will be uniquely identified and linked to neurons reconstructed in the electron microscopy volume of the larva" but no timeline is provided. Similarly, for the adult neurons it is stated on p. 4 that: "Anatomical searching for comparison to other light microscopy (LM) and EM data is being made available." A more explicit statement about what resources are and are not yet available, a timeline for full availability, and an indication of how many lines currently have been matched to EM data would be helpful.

      During the review and revision period we have made progress on processing the images in the collection. We updated the text with the current status and anticipated timeline for completion.

      Specific Points:

      p. 4 "Although the lines used for these comparisons are not a random sample, the areas of greatest difference are in the vicinity of previously described sexual dimorphisms..." In the vicinity of is a very vague statement of localization. A couple of examples of what is meant here would be useful.

      We added example images to Figure 3.

      p. 5 "...may have specific expression outside our regions of interest." It's not clear what "our regions of interest" refers to here. Please clarify.

      We clarified that we were referring to the regions studied in the publications listed in Table 1.

      p. 5 "...lines that were sparse in VNC but dirty in the brain or SEZ..." A more quantitative descriptor than "dirty" would be helpful.

      We unfortunately did not quantify the extent of undesired brain/SEZ expression, but attempted to clarify the statement.

      p. 6 "...the images are being made instantly searchable for LM and EM comparisons at NeuronBridge..." Here again it is hard to know what is meant by "being made instantly searchable." How many have been made searchable and what is the bottleneck in making the rest searchable?

      We updated the text as described above. The bottleneck has been available processing capacity for the hundreds of thousands of included images.

      Figure 1 Supplemental File 2: The movie is beautiful, but it seems more useful as art than as a reference. Perhaps converting it to a pdf of searchable images for each line would make it more useful.

      We replaced the movie with a searchable PDF.

      Fig. 2(B) legend: "Other lines may have more than two types." It is not clear what "other lines" are being referred to.

      As part of making the quality evaluation more robust, we scored lines for the clear presence of three or more cell types. We updated the text accordingly.

      Fig. 2(C): Presumably the image shown is an example of variability in expression rather than weakness, but it is hard to know without a point of comparison. Perhaps show the expression patterns of other samples? Or describe briefly in the legend what other samples looked like?

      We added Figure 2 - figure supplement 1 with examples of variable expression in a split-GAL4 line.

    1. eLife Assessment

      This important study reports on PI3KR mutations and a paradoxical mechanism of PI3KR signaling. The strength of evidence for the study is mostly convincing, as conclusions are supported by a variety of mutational strategies and cellular systems to look at interactions among signaling pathways.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides convincing data showing that expression of the PIK3R1(deltaExon11) dominant negative mutation in Activated PI3K Delta Syndrome 1/2 (APDS1/2) patient-derived cells reduces AKT activation and p110δ protein levels. Using a 3T3-L1 model cell system, the authors show that overexpressed p85α(deltaExon 11) displays reduced association with the p110α catalytic subunit but strongly interacts with Irs1/2. Overexpression of PIK3R1 dominant negative mutants inhibit AKT phosphorylation and reduce cellular differentiation of preadipocytes. The experimental design, interpretation, and quantification broadly support the authors' conclusions, which establishes a new paradigm that warrants future studies.

      Strengths:

      The strength of this study is the clear results derived from Western blots analysis of cell signaling markers (e.g. pAKT1), and co-immunoprecipitation of PI3K holoenzyme complexes and associated regulatory factors (e.g. Irs1/2). The authors analyze a variety of PIK3R1 mutants (i.e. deltaExon11, E489K, R649W, and Y657X), which reveals a range of phenotypes that support the proposed model for dominant negative activity. The use of clonal cell lines with doxycycline induced expression of the PIK3R1 mutants (deltaExon 11, R649W, and Y657X) provides convincing experimental data concerning the relationship between p85α mutant expression and AKT phosphorylation in vivo. This approach for overexpression is excellent and should be utilized more broadly by cell biologists. The authors convincingly show that p85α(deltaExon11, R649W, or Y657X) is unable to associate with p110α but instead more strongly associates with Irs1/2 compared to wild type p85α. Overall, this article does a great job of motivating future studies of SHORT and APDS2 PIK3R1 mutants expressed from their endogenous loci (e.g. knock-in mice).

      Weaknesses:

      The limitations for this study lie in the complexity of the cell signaling pathway under investigation, rather than a lack of rigor by the authors. Future experimentation will help reconcile the cell type specific differences (e.g. APDS2 patient derived cells vs. the 3T3-L1 cell model system) in PIK3R1 mutant behavior reported by the authors. This is also intimately linked to variable expression of PIK3R1 mutants and cell-type specific regulatory factors. Although beyond the scope of this work, an unbiased proteomic study that broadly evaluates the cell signaling landscape could provide a more holistic understanding of the APDS2 and SHORT mutants compared to a candidate-based approach. Additional structural biochemistry of the p110α/p85α(deltaExon 11) complex is needed to explain why PIK3R1 mutant regulatory subunits do not strongly associate with the p110 catalytic subunit. A more comprehensive biochemical analysis of p110α/p85α, p110β/p85α, and p110δ/p85α mutant protein complexes will also be necessary to explain various cell signaling phenotypes. A minor limitation of this study is the use of single end point assays to measure PI3K lipid kinase activity in the presence of one regulatory input (i.e. RTK-derived pY peptide). An expanded biochemical analysis of purified mutant PI3K complexes across the canonical membrane signaling landscape will be important for deciphering how competition between wild-type and mutant regulatory subunits are regulated in different cell signaling contexts.

    3. Reviewer #2 (Public review):

      Patsy R. Tomlinson et al; investigated the impact of different p85 alpha variants associated with SHORT syndrome or APDS2 on insulin mediated signaling in dermal fibroblasts and preadipocytes. They perform this study as APDS2 patients oftern present with features of SHORT syndrome. They found no evidence of hyperactive PI3K signalling monitored by pAKT in a APDS2 patient-derived dermal fibroblast cells. In these cells p110 alpha protein levels were comparable to levels in control cells, however, p110 delta protein levels were strongly reduced. Remarkably, the truncated APDS2-causal p85 alpha variant was less abundant in these cells than p85 alpha wildtype. Afterwards they studied the impact of ectopically expressed p85 alpha variants on insulin mediated PI3K signaling in 3T3-L1 preadipocytes. Interestingly they found that the truncated APDS2-causal p85 alpha variant impaired insulin induced signaling. Using immunoprecipitation of p110 alpha they did not find truncated APDS2-causal p85 alpha variant in p110 alpha precipitates. Furthermore, by immunoprecipitating IRS1 and IRS2 they observed that the truncated APDS2-causal p85 alpha variant was very abundant in IRS1 and IRS2 precipitates, even in the absence of insulin stimulation. These important findings add in an interesting way possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Strengths:

      Based on state-of-the-art functional studies, the authors show that the p85 alpha variant responsible for APDS2, known to be associated with increased PI3K-delta signaling, can attenuate PI3K-alpha signalling in preadipocytes, providing a possible mechanistic explanation for the growing number of APDS2 patients with features of SHORT syndrome.

      Weaknesses:

      The proposed paradigm is based on one cell line derived from an APDS2 patient and an overexpressing system. The investigation of a larger number of cell lines derived from APDS2 patients would further substantiate the conclusion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The authors identify new mechanisms that link a PIK3R1 mutant to cellular signaling and division in Activated PI3 Kinase Delta Syndrome 1 and 2 (APDS1/2). The conclusion that this mutant serves as a dominant negative form of the protein, impacting PI3K complex assembly and IRS/AKT signaling, is important, and the evidence from constitutive and inducible systems in cultured cells is convincing. Nevertheless, there are several limitations relating to differences between cell lines and expression systems, as well as more global characterization of the protein interaction landscape, which would further enhance the work.

      We are pleased by this fair assessment, while noting that this work relates to APDS2 (PIK3R1-related) rather than APDS1 (PIK3CD-related). Our findings we believe are clear, but the observation that studies including more global proteomics/phosphoproteomics in cells expressing mutants at endogenous levels would add further insight is well made. We hope that this report may motivate such studies by laboratories with wider access to primary cells from patients and knock-in mice.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      This study provides convincing data showing that expression of the PIK3R1(delta Exon11) dominant negative mutation in Activated PI3K Delta Syndrome 1/2 (APDS1/2) patient-derived cells reduces AKT activation and p110δ protein levels. Using a 3T3-L1 model cell system, the authors show that overexpressed p85α delta Exon 11) displays reduced association with the p110α catalytic subunit but strongly interacts with Irs1/2. Overexpression of PIK3R1 dominant negative mutants inhibits AKT phosphorylation and reduces cellular differentiation of preadipocytes. The strength of this article is the clear results derived from Western blots analysis of cell signaling markers (e.g. pAKT1), and co-immunoprecipitation of PI3K holoenzyme complexes and associated regulatory factors (e.g. Irs1/2). The experimental design, interpretation, and quantification broadly support the authors' conclusions.

      Strengths:

      The authors analyze a variety of PIK3R1 mutants (i.e. delta Exon11, E489K, R649W, and Y657X), which reveals a range of phenotypes that support the proposed model for dominant negative activity. The use of clonal cell lines with doxycycline-induced expression of the PIK3R1 mutants (DExon 11, R649W, and Y657X) provides convincing experimental data concerning the relationship between p85α mutant expression and AKT phosphorylation in vivo. The authors convincingly show that p85α delta Exon11, R649W, or Y657X) is unable to associate with p110α but instead more strongly associates with Irs1/2 compared to wild type p85α. This helps explain why the authors were unable to purify the recombinant p110α/p85α delta Exon 11) heterodimeric complex from insect cells.

      Weaknesses:

      Future experimentation will be needed to reconcile the cell type specific differences (e.g. APDS2 patient-derived cells vs. the 3T3-L1 cell model system) in PIK3R1 mutant behavior reported by the authors.

      This is a fair comment. It has been established for many years that relative protein levels even of wild type PIK3CA and PIK3R1 gene products influence sensitivity of PI3K to growth factor stimulation. Such issues of stoichiometry become exponentially more complicated when the numerous potential interactions among the full repertoire of Class 1 PI3K regulatory subunits (3 splice variants of PIK3R1, and also PIK3R2 and PIK3R3) and corresponding catalytic subunits (PIK3CA, PIK3CB, PIK3CD) are considered, and when different activities and stabilities of PIK3R1 mutants are added to the mix. It thus seems obvious to us that different levels of expression of different mutants in different cellular contexts will have different signalling consequences. We establish a paradigm in this paper using an overexpression system, and we strongly agree that this merits further investigation in a wider variety of primary cells (or cells with knock in at the endogenous locus), where available.

      An unbiased proteomic study that broadly evaluates the cell signaling landscape could provide a more holistic understanding of the APDS2 and SHORT mutants compared to a candidate-based approach.

      We agree. This would be highly informative, but we think would best be carried out in both “metabolic” and “immune” cells with endogenous levels of expression of SHORT or APDS2 PIK3R1 mutants. These are not all currently available to us, and require follow up studies.

      Additional biochemical analysis of p110α/p85α delta Exon 11 complex is needed to explain why this mutant regulatory subunit does not strongly associate with the p110 catalytic subunit.

      We agree. We present this observation in our overexpression system, which is clear and reproducible, even though somewhat surprising. The failure to bind p110a is likely not absolute, as sufficient p110a-p85a<sup>DEx11</sup> was synthesised in vitro in a prior study to permit structural and biochemical studies, although a series of technical workarounds were required to generate enough heterodimeric PI3K to study in vitro given the manifest instability of the complex, particularly when concentrated (PMID 28167755). We already note in discussion that p85a can homodimerize and bind PTEN, likely among other partners, and it may be that the APDS2 deletion strongly favours binding to proteins that effectively compete with p110a. However this requires further study of the wider interactome of the mutant PIK3R1, which, as noted above, are beyond the scope of the current study.

      It remains unclear why p85α delta Exon 11 expression reduces p110δ protein levels in APDS2 patient-derived dermal fibroblasts.

      We caution that we only had the opportunity to study dermal fibroblasts cultured from a single APDS2 patient, as noted in the paper, and so replication of this finding in future will be of interest. Nevertheless the observation is robust and reproducible in these cells, and we agree that this apparently selective effect on p110d  is not fully explained. Having said that, it has been observed previously that heterodimers of the DEx11 p85a variant with either p110a or p110d are unstable, and when the unstable complexes were eventually synthesised, p110a and p110d were demonstrated to show differences in engagement with the mutant p85, with greater disruption of inhibitory interactions observed for p110d (PMID 28167755). It is thus not a great stretch to imagine that as well as disinhibiting p110d more, the DEx11 p85a variant also destabilises the p85a-p110d complex more, potentially explaining its near disappearance in cells with low baseline p110d expression. Following on from the preceding question and response, however, is an alternative explanation, based on the 3T3-L1 overexpression studies in this paper, wherein we were unable to demonstrate binding of p110a by DEx11 p85a. If, in any given cellular context, the mutant p85 could bind p110d but not p110a, then the destabilising effect would be observed only for p110d. So in summary, we believe the selective effect on p110d is explained by differences in binding kinetics and heterodimer stability for different DEx11 p85a-containing complexes. The net effect of these differences may vary among cell types depending on relative levels of subunit expression.

      This study would benefit from a more comprehensive biochemical analysis of the described p110α/p85α, p110β/p85α, and p110δ/p85α mutant protein complexes. The current limitation of this study to the use of a single endpoint assay to measure PI3K lipid kinase activity in the presence of a single regulatory input (i.e. RTK-derived pY peptide). A broader biochemical analysis of the mutant PI3K complexes across the canonical signaling landscape will be important for establishing how competition between wild-type and mutant regulatory subunits is regulated in different cell signaling pathways.

      We agree that a wider analysis of upstream inputs and downstream network would be of interest, though as noted above the ultimate functional consequences of mutants will be an amalgam of any differential signalling effects of complexes that are stable enough to function, and differential effects of mutant p85a on the kinetics of distinct heterodimer assembly and stability. In this paper we seek to suggest a paradigm worthy of further, deeper assessment. We note that the search space here is large indeed (A. different cell types with differing profiles of PI3K subunit expression B. Multiple upstream stimuli and C. Multiple downstream outputs, with timecourse of responses an additional important factor to consider). These studies are realistically beyond the scope of the current work, but we hope that further studies, as suggested by the reviewer, follow.

      Reviewer #2 (Public Review)

      Summary:

      Patsy R. Tomlinson et al; investigated the impact of different p85alpha variants associated with SHORT syndrome or APDS2 on insulin-mediated signaling in dermal fibroblasts and preadipocytes. They find no evidence of hyperactive PI3K signalling monitored by pAKT in APDS2 patient-derived dermal fibroblast cells. In these cells p110alpha protein levels were comparable to levels in control cells, however, the p110delta protein levels were strongly reduced. Remarkably, the truncated APDS2-causal p85alpha variant was less abundant in these cells than p85alpha wildtype. Afterwards, they studied the impact of ectopically expressed p85alpha variants on insulin-mediated PI3K signaling in 3T3-L1 preadipocytes. Interestingly they found that the truncated APDS2-causal p85alpha variant impaired insulin-induced signaling. Using immunoprecipitation of p110alpha they did not find truncated APDS2-causal p85alpha variant in p110alpha precipitates. Furthermore, by immunoprecipitating IRS1 and IRS2, they observed that the truncated APDS2-causal p85alpha variant was very abundant in IRS1 and IRS2 precipitates, even in the absence of insulin stimulation. These important findings add in an interesting way possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Strengths:

      Based on state-of-the-art functional investigation the authors propose indicating a loss-of-function activity of the APDS2-disease causing p85alpha variant in preadipocytes providing a possible mechanistic explanation for the growing number of APDS2 patients described with features of SHORT syndrome.

      Weaknesses:

      Related to Figure 1: PIK3R1 expression not only by Western blotting but also by quantifying the RNA transcripts, e.g. mutant and wildtype transcripts, was not performed. RNA expression analysis would further strengthen the suggested impaired stabilization/binding.

      It is not completely clear to us how further PIK3R1 mRNA analysis would enhance the points we seek to make. Perhaps the reviewer’s point is that changes in protein expression could be explained by reduced transcription rather than having anything to do with altered protein turnover? As shown in Figure 1 supplemental figure 1, sequencing cDNA from each of the primary cell lines studied indicates that both mutant and WT alleles are expressed at or close to 50% of the total mRNA for PIK3CA or PIK3R1 as relevant. While this is not strictly quantitative, allied to prior evidence that these are dominant alleles which require to be expressed to exert their effect, with no evidence for altered mRNA expression of these variants in prior studies, we don’t believe any further quantification of mRNA expression would add value.

      Related to Figure 2

      As mentioned by the authors in the manuscript the expression of p110delta but also p110beta in 3T3-L1 preadipocytes ectopically expressing p85alpha variants has not been analyzed.

      We agree that such determination would have been a useful addition to the study, but regretfully it was not undertaken in these modified 3T3-L1 cells at the time of study. However independent bulk RNAseq studies of the founder 3T3-L1 cells from which the stably transduced cells were generated, undertaken as part of an unrelated study, revealed the following relative levels of endogenous expression of PI3K subunit mRNA:

      Author response table 1.

      We have not determined endogenous protein expression, and so have left the text of the discussion unchanged, simply noting that we have not formally assessed protein expression of p110d/p110b. However these transcriptomic findings suggest that p110d protein is likely either undetectable, or else present at extremely low levels compared to endogenous p110a. p110b also appears to be expressed at a much lower level than p110a. In our studies overexpressing mutant PIK3R1 and assessing insulin action, we believe we are largely or perhaps entirely assaying the effect of the mutants on p110a, in keeping with the fact that genetic and pharmacological studies have firmly established that it is p110a that is responsible for mediating the metabolic actions of insulin in adipose tissue and preadipocytes including 3T3-L1 (e.g. PMID 16647110). Indeed, to quote from this study, in 3T3-L1 “… inhibitors of p110b (TGX-115 and TGX-286) and p110d (IC87114 and PIK-23) had no effect on the insulin-stimulated phosphorylation of any protein in the PI3-K pathway.”

      We have added the following sentence to the discussion:

      “The current study has limitations. We have studied primary cells from only a single APDS2 patient, and in the 3T3-L1 cell model, we did not determine whether p110d protein could be detected. If not, this could explain the lack of detectable AKT phosphorylation with induction of Pik3r1 DEx11.  Indeed, previous pharmacological studies in 3T3-L1 adipocytes has shown that selective inhibition of p110d or p110b does not alter insulin-induced phosphorylation of any protein studied in the PI3-K pathway, attesting to the dominance of p110a in insulin action in this cell model (Knight et al, 2006).” 

      Furthermore, a direct comparison of the truncated APDS2-causal p85alpha variant with SHORT syndrome-causal p85alpha variants in regard to pAKT level, and p85alpha expression level has not been performed.

      These investigations would further strengthen the data.

      The cell lines conditionally expressing SHORT syndrome variants have been reported already, as cited (PMID: 27766312). Remarkably, the degree of inhibition of insulin-stimulated signalling is actually less pronounced for the SHORT syndrome variants than for the overexpressed APDS2 variant, as seen in the excerpt from the prior paper below. In this prior paper the maximum insulin concentration used, 100nM, was the concentration used in the current study. While overexpression of the APDS2 p85a variant ablated the response to insulin entirely, it is still seen in the prior study, albeit at a clearly reduced level.

      Related to Figure 3

      The E489K and Y657X p85alpha variants should be also tested in combination with p110delta in the PI3K activity in vitro assay. This would help to further decipher the overall impact, especially of the E489K variant.

      We agree that this would make our data more complete, but for logistical reasons (primarily available personnel) we were compelled to constrain the number of p85-p110 combinations we studied. We elected to prioritise the PIK3R1 R649W variant as by far the most common causal SHORT syndrome variant, and as the variant showing the “cleanest” functional perturbation, namely severely impaired or absent ability to dock to phosphotyrosines in cognate proteins.  The paradox that we sought to explain in this paper, namely the phenotypic combination of gain-of-function APDS2 with loss-of-function SHORT syndrome features holds only for APDS2 PIK3R1 variants, and so while it is interesting to document that the canonical SHORT syndrome variant also inhibits PI3Kb and PI3Kd activation in vitro, this was not the main purpose of our study.

      Reviewer #1 (Recommendations For The Authors):

      Points of clarification and suggestions for improving the manuscript:

      (1) Explain whether there are any PIK3R1-independent genetic alterations in the APDS2 and PROS-derived cell lines. For example, are there differences in the karyotype of mutant cell lines compared to wild-type cells?

      Karyotypic abnormalities are not an established feature of either PROS or APDS2, and the patients from whom cells were derived were documented to be of normal karyotype. Karyotypic abnormalities acquired during cell culture would not be unprecedented, but confirming normal karyotypes in primary cell lines where there is no specific reason to suppose any alteration exceeds normal expectations for primary cell studies, and so this has not been undertaken.

      (2) When introducing the APDS2-associated PIK3R1 mutation (lines 126-128), the authors describe both the exon 11 skipping and in-frame deletions. I recommend rewording this sentence to say exon 11 skipping results in an in-frame deletion of PIK3R1. The current wording makes it seem like APDS2-derived cells contain two genetic perturbations: (1) exon 11 skipping and (2) in-frame deletion. Include a diagram in Figure 1 to help explain the location of the mutations being studied in relationship to the PIK3R1 gene sequence and domains (i.e. nSH2, iSH2, cSH2). The description of the exon 11 skipping and in-frame deletions (lines 126-128) would benefit from having a complementary figure that diagrams the location of these mutations in the PIK3R1 gene.

      On review we agree that clarity of description could be enhanced. We have now edited these lines as follows:

      “We began by assessing dermal fibroblasts cultured from a previously described woman with APDS2 due to the common causal PIK3R1 mutation. This affects a splice donor site and causes skipping of exon 11, leading to an in-frame deletion of 42 amino acids (434-475 inclusive) in the inter-SH2 domain, which is shared by all PIK3R1 isoforms (Patient A.1 in (Lucas et al., 2014b))(Figure 1 figure supplement 1).”

      We have moreover introduced a further figure element including a schematic of all PIK3R1 mutations reported in the current study (new Figure 1 figure supplement 1)

      (3) For Figure 2, I recommend including a cartoon that illustrates the experimental design showing the induced expression of PIK3R1 mutants, R649W and Y657X, in the background of the wild-type endogenous gene expression.

      Such a figure element has now been generated and included as Figure 2 figure supplement 1, duly called out in the results section where appropriate.

      (4) For the data plotted in Figure 1B-1C, please clarify whether the experiments represent a single patient or all 3-4 patients shown in Figure 1A.

      Each datapoint shown represents one of the patients in the immunoblots, with all patients included. Each point in turn is the mean from 3 independent experiments. We have added the following to the Figure legend:

      “(B)-(E) quantification of immunoblot bands from 3 independent experiments shown for phosphoAKT-S473, phosphoAKT-T308, p110d and p110a respectively. Each point represents data from one of the patient cell lines in the immunoblots. Paired datapoints +/- insulin are shown in (B) and (C), and dotted lines mark means.”

      (5) I recommend rewording the following sentence: "Given this evidence that APDS2-associated PIK3R1 delta Exon 11 potently inhibits PI3Kα when overexpressed in 3T3-L1 preadipocytes," to say "... potently inhibits PI3Kα signaling when overexpressed in 3T3-L1 preadipocytes." The data shown in Figures 1 and 2 do not support a direct biochemical inhibition of PI3Kα lipid kinase activity by p85α (delta Exon 11).

      This edit has been made.

      (6) Provide more discussion concerning the percentage of humans with APDS2 or SHORT syndrome that contain the mutations discussed in this paper. How strong is the genotype-phenotype link for these diseases? Are these diseases inherited or acquired through environmental stresses?

      Both APDS2 and SHORT syndrome are very well established, highly penetrant and stereotyped monogenic disease. APDS is defined by the presence of activating PIK3R1 mutations such as the one studied here (by far the commonest causal mutation).  SHORT syndrome clinically has some superficial resemblance to other human genetic syndrome including short stature, but when careful attention is paid to characteristic features it is nearly universally attributable to loss-of-function PIK3R1 mutations with the single exception of one case in which a putatively pathogenic PKCE mutation was described (PMID: 28934384). Although both syndromes are monogenic it is often not accurate to refer to them as inherited, as, particularly in SHORT syndrome, de novo mutations (i.e. not found in either parent) are common. Environmental modifiers of phenotypes have not been described. To the introduction has now been added the comment that both conditions are highly penetrant and monogenic.

      (7) The data presented in Figure 5 would benefit from additional discussion and citations that describe the molecular basis of the interaction between PI3K and Irs1/2. What studies have previously established this is a direct protein-protein interactions? Are there PI3K mutants that don't interact with Irs1/2 that can be included as a negative control? Alternatively, the authors can simply reference other papers to support the mechanism of interaction.

      There is a voluminous literature dating back to the early 1990s documenting the mode of interaction of PI3K with Irs1/2. Relevant papers have now been cited as requested:

      p85-Irs1 binding: PMID 1332046 (White lab, PNAS 1992)

      p85-Irs2 binding: PMID 7675087 (White lab, Nature 1995)

      “This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 (PMID 1332046) and Irs2 (PMID 7675087).”

      Regarding PI3K mutants that don't interact with Irs1/2, the SHORT syndrome mutant R649W which we include in this study is perhaps the best example of this, so it is both disease-causing and functions as such a negative control.

      (8) To see the effect of the dominant negative delta Exon 11, the truncated p85α needs to be super stoichiometric to the full-length p85α (Figure 2 - Supplemental Figure 2). This is distinct from the results in Figure 1 showing the ADPS2-derived dermal fibroblast express 5-10x lower levels of p85α delta Exon 11 compared to full-length p85α (Figure 1A), but still strongly inhibits pAKT S473 and T308 (Figure 1B-1C). The manuscript would benefit from more discussion concerning the cell type specific differences in phenotypes. Alternatively, do the APDS2-derived dermal fibroblasts have other genetic perturbations that are not accounted for that potentially modulate cell signaling differently compared to 3T3-L1 preadipocytes?

      The reviewer is astute to point out this apparent contrast. First of all, we have no reason to suppose there is any specific, PI3K-modifying genetic perturbation present in the primary dermal fibroblasts studied, although of course the genetic background of these cells is very distinct to that of 3T3-L1 mouse embryo fibroblasts. Related to such background differences, however, substantial variability is usually apparent in insulin-responsiveness even of healthy control dermal fibroblasts. This means that caution should be exercised in extrapolating from studies of the primary cells of a single individual. To illustrate this, we point the reviewer to our 2016 study in which we extensively studied the dermal fibroblasts of a proband with SHORT syndrome due to PIK3R1 Y657X:

      From this study we conclude that A. WT controls show quite substantial variation in insulin-stimulated AKT phosphorylation and B. even the SHORT syndrome p85a Y657X variant, expressed at higher levels that WT p85a in dermal fibroblasts, does not produce an obvious decrease in insulin-stimulated AKT phosphorylation, despite extensive evidence from other human cell studies and knock-in mice that it does indeed impaired insulin action in metabolic tissues. For both these reasons we are not convinced that the lower insulin-induced AKT phosphorylation we described in Figure 1 should be overinterpreted until reproduced in other studies with primary cells from further APDS2 patients. This is why we did not comment more extensively on this. We now add the following qualifier in results:

      “Despite this, no increase in basal or insulin-stimulated AKT phosphorylation was seen in APDS2 cells compared to cells from wild-type volunteers or from people with PROS and activating PIK3CA mutations H1047L or H1047R (Fig 1A-C, Fig 1 figure supplement 3A,B). Although insulin-induced AKT phosphorylation was lower in fibroblasts from the one APDS2 patient studied compared to controls, we have previously reported extensive variability in insulin-responsiveness of primary dermal fibroblasts from WT controls. Moreover even primary cells from a patient expressing high levels of the SHORT syndrome-associated p85a Y657X did not show attenuated insulin action, so we do not believe the reduced insulin action in APDS2 cells in the current study should be overinterpreted until reproduced in further APDS2 cells.”

      Nevertheless we remind the reviewer that the main purpose of our primary cell experiment was to determine if there were any INCREASE in basal PI3K activity, or any difference in p110a or p110d protein levels, and we regard our findings in these regards to be clear.

      The manuscript would benefit from additional explanation concerning why the E489K, R649W, and Y657X are equivalent substitutes for the characterization of p110α/p85α delta Exon 11). Perhaps a more explicit description of these mutations in relationship to the location of p85α delta Exon 11) mutation would help. I recommend including a diagram in Figure 3 showing the position of the delta Exon 11, E489K, R649W, and Y657X mutations in the PIK3R1 coding sequence. B. Also, please clarify whether all three holoenzyme complexes were biochemically unstable (i.e. p110α/p85α, p110β/p85α, p110δ/p85α) when p85α delta Exon 11) was expressed in insect cells.

      A. Whether or not E489K, R649W and Y657X are “equivalent” to the APDS2 mutant is not really a meaningful issue here. These mutants are being studied because they cause SHORT syndrome without immunodeficiency, while the APDS2 mutant causes APDS2 often with features of SHORT syndrome. That is, it is naturally occurring mutations and the associated genotype-phenotype correlation that we seek to understand. Of the 3 SHORT syndrome causal mutations chosen, R649W is by far the commonest, effectively preventing phosphotyrosine binding, Y657X has the interesting attribute that it can be discriminated from full length p85 on immunoblots due to its truncation, and is moreover a variant that we have studied in cells and mice before, while the rarer E489K is an interesting SHORT syndrome variant as it is positioned more proximally in the p85a protein than most SHORT syndrome causal variants. All variants studied are now illustrated in the new Figure 1 figure supplement 1. B. Regarding stability of PI3K heterodimers containing the APDS2 p85a variant, we tried extensively to purify p110a and p110d complexes without success despite several approaches to optimise production. We did not try to synthesise the p110b-containing complex.

      (10) I recommend presenting the results in Figure 4 before Figure 3 because it provides a good rationale for why it's difficult to purify the p110α/p85α delta Exon 11) holoenzyme from insect cells.

      This would be true of p110d were studied in Figure 4 but it is not. Figure 4 looks instead at effects on p110a of heterologous overexpression of mutant p85, is a natural lead in to the ensuing figures 5 and 6, and we do not agree it would add value or enhance flow to swap Figures 3 and 4.

      (11) The authors show that overexpression of the p85α delta Exon 11) did not result in p110α/p85α delta Exon 11) complex formation based on co-immunoprecipitation. Do the authors get the same result when they co-immunoprecipitation p110α/p85α and p110δ/p85α in the APDS2-derived dermal fibroblasts used in Figure 1A?

      This is an interesting question but not an experiment we have done. It is not unfeasible, but generating enough cells to undertake IP experiments of this nature in dermal fibroblasts is a significant undertaking, and with finite resources available and only one primary cell line to study we elected not to pursue this.

      Details in Methods section:

      (1) Include catalog numbers and vendors for reagents (e.g. lipids, PhosSTOP, G-Dynabeads, etc.). There is not enough information provided to reproduce this work.

      We have now added all vendors and catalogue numbers where relevant.

      (2) Concerning the stated lipid composition (5/10/15/45/20/5 %) in the liposome preparation protocol. Please clarify whether these numbers represent molar percentages or mg/mL percentages.

      We have now added that this is expressed as “(wt/vol)”

      (3) What is the amino acid sequence of the PDGFR (pY2) peptide used for the PI3K activity assay?

      This assay has been published and references with detailed methods are cited. For clarity, however we now say:

      “PI(3,4,5)P3 production was measured by modified PI3-Kinase activity fluorescence polarisation assay (Echelon Biosciences, Salt Lake City, UT, USA). 10μL reactions in 384-well black microtitre plates used 1mM liposomes containing 50μM PI(4,5)P2, optimised concentrations of purified PI3K proteins, 100μM ATP, 2mM MgCl2, with or without 1μM tyrosine bisphosphorylated 33-mer peptide derived from mouse PDGFRβ residues 735-767, including phosphotyrosine at positions 740 and 751 (“pY2”; 735-ESDGGYMDMSKDESIDYVPMLDMKGDIKYADIE-767;  Cambridge peptides).”

      (4) Include a Supplemental file containing a comprehensive description of the plasmids and coding sequencing used in this study.

      Such a supplemental file has been created and is included as Table 2

      Minor points of clarification, citations, and typos:

      (1) Clarify why Activated PI3K Delta Syndrome 1 (APDS1) is thus named APDS2. See lines 71-72 of the introduction. Also see line 89: "...is common in APDS2, but not in APDS1." Briefly describe the difference between APDS1 and APDS2?

      This is described in the introduction, but we apologise if our wording was not sufficiently clear. We have tried now to remove any ambiguity:

      “Some PIK3R1 mutations reduce basal inhibition of catalytic subunits, usually due to disruption of the inhibitory inter-SH2 domain, and are found in cancers (Philp et al, 2001) and vascular malformations with overgrowth(Cottrell et al, 2021). In both diseases, hyperactivated PI3Ka, composed of heterodimers of PIK3R1 products and p110a, drives pathological growth. Distinct inter-SH2 domain PIK3R1 mutations, mostly causing skipping of exon 11 and deletion of residues 434-475, hyperactivate PI3Kd in immune cells, causing highly penetrant monogenic immunodeficiency (Deau et al, 2014; Lucas et al, 2014b). This phenocopies the immunodeficiency caused by genetic activation of p110d itself, which is named Activated PI3K Delta Syndrome 1 (APDS1) (Angulo et al, 2013; Lucas et al, 2014a). The PIK3R1-related syndrome, discovered shortly afterwards, is thus named APDS2.”

      (2) Figure legend 1. Clarify reference to "Figure EV2".

      (3) Figure legend 2. Clarify reference to "Figure EV3".

      (4) Figure legend 3. Clarify reference to "Figure EV5".

      Thank you for pointing out this oversight, arising from failure to update nomenclature fully between versions. “EV” figures actually are the figure supplements in the submission. All labels have now been updated.

      (5) For Figure 1 - supplemental figure 1C, indicate experimental conditions on the blot (e.g. -/+ insulin).

      This is now added

      (6) Figure 4B, y-axis. Clarify how data was quantified. Perhaps reword "(% WT without DOX)" for clarity.

      We have left the Y axis label as it is, but have added the following to the figure legend:

      “(B) Quantification of immunoblot bands from immunoprecipitates from 3 independent experiments, expressed as a percentage relative to the intensity of the band in WT cells without doxycycline exposure.”

      (7) In the results section (lines 117-124), please explicitly state whether the described mutations are homo- or heterozygous.

      All mutations are heterozygous, as now explicitly stated

      (8) I recommend spelling out the SHORT and APDS2 acronyms in the abstract to make this study more accessible.

      We respectfully disagree that such spelling out in the abstract would improve accessibility. Both acronyms are clunky and wordy and are more likely to obscure meaning by squeezing out other words in the abstract. APDS is already spelled out in the introduction, and we now add the following for SHORT syndrome:

      “More surprisingly, phenotypic overlap is reported between APDS2 and SHORT syndrome. SHORT syndrome, named for the characteristic developmental features (Short stature, Hyperextensibility, Hernia, Ocular depression, Rieger anomaly, and Teething delay) is caused by loss of PI3Ka function due to disruption of the phosphotyrosine-binding C-terminal SH2 domain (Chudasama et al, 2013; Dyment et al, 2013; Thauvin-Robinet et al, 2013).”

      (9) I recommend explaining in more detail or rewording the following jargon/terms to make the writing more accessible to a broad audience: "reduced linear growth" (line 83) and "larger series" (line 86). I assume "reduced linear growth" is height.

      Edited as follows:

      “It  features short stature, insulin resistance, and dysmorphic features (Avila et al, 2016). In recent years, both individual case reports (Bravo Garcia-Morato et al, 2017; Petrovski et al, 2016; Ramirez et al, 2020; Szczawinska-Poplonyk et al, 2022) and larger case series (Elkaim et al, 2016; Jamee et al, 2020; Maccari et al, 2023; Nguyen et al, 2023; Olbrich et al, 2016; Petrovski et al., 2016) have established that many people with APDS2 have overt features of SHORT syndrome, while, more generally, linear growth impairment is common in APDS2, but not in APDS1.”

      (10) For clarity, reword lines 214-215 to read, "No increase in p110α levels was seen on conditional overexpression of wild-type or R649W p85α."

      Change made, thank you

      (11) Figure 6A - Western blot label says, "657X" instead of "Y657X."

      Now corrected

      (12) Lines 214-215: For clarity, reword the sentence to say, "No increase in p110α was seen on conditional overexpression...".

      REPEAT OF POINT 10 ABOVE

      (13) Clarify what interactions are being competed for in the following statement: "... delta Ex11 may exert its inhibitory action by competing with PI3K holoenzyme" (lines 237-238). Are you referring to the interaction between p110α and p85α or the interaction between p110α/p85α and another protein?

      We have endeavoured to clarify by editing as follows:

      “As APDS2 p85a DEx11 does not appear to displace wild-type p85a from p110a despite strong overexpression, it is likely that there are high levels of truncated p85a unbound to p110a in the cell. This may be important, as p85a mediates recruitment of PI3K to activated tyrosine kinase receptors and their tyrosine phosphorylated substrates, including the insulin-receptor substrate proteins Irs1 and Irs2. Excess free regulatory subunits compete with heterodimeric PI3K holoenzyme for binding to these phosphotyrosines (Ueki et al., 2002), raising the possibility that excess free, truncated APDS2 p85a DEx11 may exert its inhibitory action similarly by outcompeting PI3K holoenzyme for phosphotyrosine binding.”

      (14) Provide more information about the following statement and how it relates to the mutations in this study: "Homozygous truncating PIK3R1 mutations abolishing p85α expression while preserving p55α and p50α produce agammaglobulinaemia" (lines 271-272). The manuscript would benefit from a more explicit description of the nature of these mutations.

      This wording seems to us to be explicit, however we agree that a schematic of PIK3R1 genotype-phenotype correlation, as requested elsewhere, would help readers. Such a schematic is now included as Figure 1 figure supplement 1.

      (15) Typo on line 299: "unclike".

      Corrected.

      (16) The data presented in this study support a model in which p85α (DExon 11) expression functions as a dominant negative. Please clarify why in the discussion section you explain that p85α (DExon 11) activates PI3K. For example, "...skipping of exon 11, were shown in 2014 to activate PI3K..." (lines 290-291), "...activate PI3Kδ on one hand..." (line 309); "...APDS2 mutations in PIK3R1 has mixed consequences, producing greater hyperactivation of p110δ than p110α" (lines 354-355).

      We do not entirely understand the reviewer’s question and thus request here. p85α (DExon 11) activates PI3Kd in immune cells and in vitro, and this is accepted, based on numerous reports, to be the mechanism underlying immunodeficiency. We do not challenge this, and cite evidence for any such claims in our report. The dominant negative activity we describe here towards PI3Ka activation is based not on inhibition of mutant-containing heterodimer, but rather on destabilisation of and/or competition with heterodimeric WT holoenzyme. This is the basis of the model we present; that is, a finely balanced competition between enzymic activation and mutant holoenzyme destabilisation and competition of mutant free p85a with WT holoenzyme, whose net effect likely differs among cells and tissues, most likely based on the repertoire and proportions of PI3K subunit expression. If the reviewer has specific suggestions for us that will make this point clearer still we should be happy to consider them.

      (17) Provide references for the statements in lines 349-353 of the discussion.

      This brief closing paragraph is a succinct recap and summary of the key points made throughout the manuscript and thoroughly referenced therein. We prefer to keep this section clean to maximise clarity, but are happy to copy references from the various other places in the manuscript to back up these assertions if this is preferred by the editorial team. Current text:

      “In summary, it is already established that: A. genetic activation of PIK3CD causes immunodeficiency without disordered growth, while B. inhibition of PIK3R1 recruitment to RTKs and their substrates impairs growth and insulin action, without immunodeficiency, despite all catalytic subunits being affected and C. loss of p85 alone causes immunodeficiency.”

      Reviewer #2 (Recommendations For The Authors):

      In the abstract line 42 I would rather talk from SHORT syndrome like features.

      Some patients do indeed meet the criteria for SHORT syndrome, but there is a spectrum. We have thus added this qualification and removed “short stature” to maintain the word count, as this is itself a SHORT syndrome-like feature.

      Line 74 It would be helpful for the reader to give the amino-acid exchange and affected position of this single case.

      We agree. Now added.

      Furthermore, an illustration indicating the location of the different PIK3R1 variants on the p85 alpha level would be helpful for the reader.

      As noted above such a figure element is now included as Figure 1 figure supplement 1 and duly called out in the text

      The sentence in lines 298-300 makes no sense to me. Do you mean, unlike APDS1 murine models?

      We agree, on review, that this paragraph is convoluted and makes a simple observation complex. We have rewritten now in what we hope is a more accessible style:

      “Thus, study of distinct PIK3R1-related syndromes shows that established loss-of-function PIK3R1 mutations produce phenotypes attributable selectively to impaired PI3Ka hypofunction, while activating mutations produce phenotypes attributable to selectively increased PI3Kd signalling. Indeed, not only do such activating mutations not produce phenotypes attributable to PI3Ka activation, but they surprisingly have features characteristic of impaired PI3Ka function.”

      Line 321 I propose including the notion of different cells: “The balance between expression and signalling in different cells may be a fine one ...”

      This change has been made

      Line 352 C. loss replace with complete loss.

      “C.” actually denotes the last in a list after “A.” and “B.”. We have now used bold to emphasise this, but we imagine house style may dictate how we approach this.

    1. eLife Assessment

      This important study provides insights into the physiological role of RIPK1 in liver physiology, particularly during short-term fasting. The discovery that RIPK1 deficiency sensitizes the liver to acute injury and hepatocyte apoptosis is based on convincing evidence, highlighting the importance of RIPK1 in maintaining liver homeostasis under metabolic stress. The work will be of relevance to anyone studying liver pathologies.

    2. Reviewer #1 (Public review):

      This study presents an investigation into the physiological functions of RIPK1 within the context of liver physiology, particularly during short-term fasting. Through the use of hepatocyte-specific Ripk1-deficient mice (Ripk1Δhep), the authors embarked on an examination of the consequences of Ripk1 deficiency in hepatocytes under fasting conditions. They discovered that the absence of RIPK1 sensitized the liver to acute injury and hepatocyte apoptosis during fasting, a finding of significant interest given the crucial role of the liver in metabolic adaptation. Employing a combination of transcriptomic profiling and single-cell RNA sequencing techniques, the authors uncovered intricate molecular mechanisms underlying the exacerbated proinflammatory response observed in Ripk1Δhep mice during fasting. While the investigation offers valuable insights into the consequences of Ripk1 deficiency in hepatocytes during fasting conditions, there appears to be a primarily descriptive nature to the study with a lack of clear connection between the experiments. Thus, a stronger focus is warranted, particularly on understanding the dialogue between hepatocytes and macrophages. Moreover, the data would benefit from reinforcement through additional experiments such as Western blotting, flow cytometry, and rescue experiments, which would offer a more quantitative aspect to the findings. By incorporating these enhancements, the study could achieve a more comprehensive understanding of the underlying mechanisms and ultimately strengthen the overall impact of the research.

      Comments on revision:

      The authors have addressed my comments accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      Zhang et al. analyzed the functional role of hepatocyte RIPK1 during metabolic stress, particularly its scaffold function rather than kinase function. They show that Ripk1 knockout sensitizes the liver to cell death and inflammation in response to short-term fasting, a condition that would not induce obvious abnormality in wild-type mice.

      Strengths:

      The findings are based on a knockout mouse model and supported by bulk RNA-seq and scRNA-seq. The work consolidates the complex role of RIPK1 in metabolic stress.

      Comments on revision:

      The authors have addressed my concerns. The added experiments consolidated the findings. I do not have further comments.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      The study presents valuable findings on the role of RIPK1 in maintaining liver homeostasis under metabolic stress. Strengths include the intriguing findings that RIPK1 deficiency sensitizes the liver to acute liver injury and apoptosis, but because the conclusions require additional experimental support, the evidence is incomplete.

      We are truly grateful, and wish to express our sincere acknowledgement to the reviewer and the editor for the time and effort spent in reviewing our manuscript. We highly appreciate the thorough and constructive comments, which can greatly improve our manuscript. We have conducted new experiments to address the reviewer’s concerns. We also carefully checked and changed our manuscript according to the constructive suggestions by the reviewer. Hopefully we have adequately addressed all the concerns. In the revised manuscript version, changes are highlighted in yellow. Please find the detailed point-to-point responses below. 

      Public Reviews:

      Reviewer #1 (Public Review):

      This study presents an investigation into the physiological functions of RIPK1 within the context of liver physiology, particularly during short-term fasting. Through the use of hepatocyte-specific Ripk1-deficient mice (Ripk1Δhep), the authors embarked on an examination of the consequences of Ripk1 deficiency in hepatocytes under fasting conditions. They discovered that the absence of RIPK1 sensitized the liver to acute injury and hepatocyte apoptosis during fasting, a finding of significant interest given the crucial role of the liver in metabolic adaptation. Employing a combination of transcriptomic profiling and single-cell RNA sequencing techniques, the authors uncovered intricate molecular mechanisms underlying the exacerbated proinflammatory response observed in Ripk1Δhep mice during fasting. While the investigation offers valuable insights into the consequences of Ripk1 deficiency in hepatocytes during fasting conditions, there appears to be a primarily descriptive nature to the study with a lack of clear connection between the experiments. Thus, a stronger focus is warranted, particularly on understanding the dialogue between hepatocytes and macrophages. Moreover, the data would benefit from reinforcement through additional experiments such as Western blotting, flow cytometry, and rescue experiments, which would offer a more quantitative aspect to the findings. By incorporating these enhancements, the study could achieve a more comprehensive understanding of the underlying mechanisms and ultimately strengthen the overall impact of the research.

      We thank the reviewer for the encouraging comments and helpful suggestions. We agree with the reviewer that additional experiments could reinforce our findings. Therefore, we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Detailed major concerns:

      (1) Related to Figure 1.

      It is imperative to ensure consistency in the number of animals analyzed across the different graphs. The current resolution of the images appears to be low, resulting in unsharp visuals that hinder the interpretation of data beyond the presence of "white dots". To address this issue, it is recommended to enhance the resolution of the images and consider incorporating zoom-in features to facilitate a clearer visualization of the observed differences. Moreover, it would be beneficial to include a complete WB analysis for the cell death pathways analyzed. These adjustments will significantly improve the clarity and interpretability of Figure 1.

      Thanks very much for the constructive advice. We carefully checked the number of animals and make sure that the animal number were consistent within different figures. We further updated the figures with incorporating zoom-in features in updated Figure 1, and the resolution of the figures were greatly improved. Western blot analysis were also included in updated Supplementary Figure 1.

      (2) Related to Figure 2.

      It is essential to ensure consistency in the number of animals analyzed across the different graphs, as indicated by n=6 in the figure legend (similar to Figure 1). Additionally, it is crucial to distinguish between male and female subjects in the dot plots to assess any potential gender-based differences, which should be consistent throughout the paper. To achieve this, the dots plot should be harmonized to clearly differentiate between males and females and investigate if there are any disparities between the genders. Moreover, it is imperative to correlate hepatic inflammation with the activation of Kupffer cells, infiltrating monocytes, and/or hepatic stellate cells (HSCs). Therefore, conducting flow cytometry would be instrumental in achieving this correlation. Additionally, the staining for Ki67 appears to be non-specific, showing a granular pattern reminiscent of bile crystals rather than the expected nuclear staining of hepatocytes or immune cells. It is crucial to ensure specific staining for Ki67, and conducting in vitro experiments on primary hepatocytes could further elucidate the proliferation process. These experiments are relatively straightforward to implement and would provide valuable insights into the mechanisms underlying hepatic inflammation and proliferation.

      Thanks very much for the helpful advice. First, we corrected the number of animals analyzed in different graphs and make sure that the number of animals listed in the figure legend were consistent with the graphs in all figures. Second, to distinguish the results between male and female mice, blue represents male mice, pink represents female mice, and green represents RIPK1 kinase inactivated mice. The majority of results were obtained from male mice, and our results indicated that there was no difference between male and female mice herein.

      The percentages of immune cell subpopulations isolated from mouse liver tissue were determined. The results were consistent with single cell analysis that greater number of  macrophages were recruited into the liver tissue in Ripk1<sup>Δhep</sup> upon 12-hour fasting (updated Figure 4F&G).

      To confirm the results of Ki67, we first detected the transcriptional expression of Ki67 using real-time qPCR, and the results were consistent with the protein expression measured by immunohistochemical analysis. The percentage of Ki67<sup>+</sup> cells in liver cells were also detected, and there was significantly more Ki67<sup>+</sup> cells in Ripk1<sup>Δhep</sup> mouse liver than WT control mouse upon 12-hour fasting. Taken together, our transcriptional analysis, immunohistochemical analysis as well as flow cytometry data indicated that Ki67 expression was higher in Ripk1<sup>Δhep</sup> mice than Ripk1<sup>fl/fl</sup> mice. (updated Figure 2). 

      (3) Related to Figure 3 & related to Figure 4.

      The immunofluorescence data presented are not entirely convincing and are insufficient to conclusively demonstrate the recruitment of monocytes. Previous suggestions for flow cytometry studies remain pertinent and are indeed necessary to bolster the robustness of the data and conclusions. Conducting flow cytometry analyses would provide more accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings and strengthening the overall conclusions of the study. Regarding the single-cell RNA sequencing analysis presented in the manuscript, it's worth questioning its relevance and depth of information provided. While it successfully identifies a quantitative difference in the cellular composition of the liver between control and knockout mice, it may fall short in elucidating the intricate interactions between different cell populations, which are crucial for understanding the underlying mechanisms of hepatic inflammation. Therefore, I propose considering alternative bioinformatic analyses, such as CellPhone-CellChat, which could potentially provide a more comprehensive understanding of the cellular dynamics and interactions within the liver microenvironment. By examining the dialogue between different cell clusters, these analyses could offer deeper insights into the functional consequences of Ripk1 deficiency in hepatocytes and its impact on hepatic inflammation during fasting.

      Thanks very much for the constructive suggestion. We agree with the reviewer that conducting flow cytometry analyses would provide accurate and quantitative assessments of monocyte recruitment, ensuring the reliability of the findings. Following the advice, both WT and Ripk1<sup>Δhep</sup> mice were fasted for 12 hour and then single hepatic cells were isolated and analyzed by flow cytometry. As indicated in updated Figure 4F&G, the percentage of F4/80<sup>+</sup>CD11b<sup>+</sup> cells were significantly higher in Ripk1<sup>Δhep</sup> compared with WT control mice, confirming that more monocytes were recruited into the liver.

      Additionally, we performed CellChat analysis on the single-cell transcriptomic data. As shown in updated Figures 4H-J, both the number of ligand-receptor pairs and the interaction strength among the eight cell types were significantly increased in Ripk1<sup>Δhep</sup> mice, particularly the interactions between macrophages and other cell types. Network analysis indicated that inflammation and proliferation signals were amplified in Ripk1<sup>Δhep</sup> mice. Consistent with the bulk RNA sequencing data, SAA signaling was upregulated in the hepatocytes of Ripk1<sup>Δhep</sup> mice (updated Figure 4K). SAA has been found to play a role in regulating immune responses and tumor development. Based on these findings, we speculate that fasting-induced liver injury in RIPK1 knockout mice may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. The above data analysis and interpretation were included in the updated Figure 4&S4 and line 421 - 443.

      (4) Related to Figure 5.

      What additional insights do the data from Figure 5 provide compared to the study published in Nat Comms, which demonstrated that RIPK1 regulates starvation resistance by modulating aspartate catabolism (PMID: 34686667)?

      Thank you very much for your constructive suggestion. As noted by the reviewer, this study (PMID: 34686667) primarily focuses on metabolomic analyses of Ripk1<sup>-/-</sup> neonatal mouse brain tissue and Ripk1<sup>-/-</sup> MEF cells. The authors propose that Ripk1 regulates starvation resistance by modulating aspartate catabolism.

      In our study, the global metabolic changes induced by fasting were monitored. Fastinginduced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation, and excessive deposition of free fatty acids has been shown to induce endoplasmic reticulum (ER) stress in the liver. Data from Figure 5 demonstrate that administering the ER stress inhibitor 4-PBA effectively mitigated fasting-induced liver injury and inflammatory responses in Ripk1<sup>Δhep</sup> mice. Our findings suggest that ER stress plays a critical role in fasting-induced liver injury and inflammation in Ripk1<sup>Δhep</sup> mice.

      (5) Related to Figure 6.

      The data presented in Figure 7 are complementary and do not introduce new mechanistic insights.

      Thank you very much for your insightful suggestion. As you mentioned, the AAV-TBG-Cre-mediated liver-specific RIPK1 knockout mice offer complementary validation of the results obtained from Ripk1<sup>Δhep</sup> mice. Moreover, TBG is a promoter that is exclusively expressed in mature hepatocytes, while the ALB promoter is active not only in mature hepatocytes but also in precursor cells and cholangiocytes. Therefore, we think that the inclusion of AAV-TBG-Cre further strengthens our finding that RIPK1 in hepatocytes is responsible for fasting-induced liver injury and inflammatory responses.

      (6) Related to Figure 7.

      The data from Figure 7 suggest that RIPK1 in hepatocytes is responsible for the observed damage. However, it has been previously demonstrated that inhibition of RIPK1 activity in macrophages protects against the development of MASLD (PMID: 33208891). One possible explanation for these findings could be that the overreaction of macrophages to fasting, coupled with the absence of RIPK1 in hepatocytes (an indirect effect), contributes to the observed damage. Considering this, complementing hepatocytes with a kinase-dead version of RIPK1 could be a valuable approach to further refine the molecular aspect of the study. This would allow for a more precise investigation into the specific role of RIPK1's scaffolding or kinase function in response to starvation in hepatocytes. Such experiments could provide additional insights into the mechanisms underlying the observed effects and help delineate the contributions of RIPK1 in different cell types to metabolic stress responses.

      Thank you very much for the constructive suggestion. We fully agree with the reviewer that employing a RIPK1 kinase-inactive mutant mice could precisely investigate the specific roles of RIPK1's scaffolding and kinase functions in hepatocyte responses to starvation, respectively. In accordance with this advice, we established a 12-hour fasting model using Ripk1<sup>WT/WT</sup> and Ripk1<sup>K45A/K45A</sup> mice, which were previously established and confirmed with the inactivity of RIPK1 kinase activity. As demonstrated in updated Supplementary Figure 2, these mice did not show significant liver damage or inflammatory responses after 12 hours of fasting. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not be contributed by the kinase activity of RIPK1.  

      Reviewer #2 (Public Review):

      Summary:

      Zhang et al. analyzed the functional role of hepatocyte RIPK1 during metabolic stress, particularly its scaffold function rather than kinase function. They show that Ripk1 knockout sensitizes the liver to cell death and inflammation in response to short-term fasting, a condition that would not induce obvious abnormality in wild-type mice.

      Strengths:

      The findings are based on a knockout mouse model and supported by bulk RNA-seq and scRNA-seq. The work consolidates the complex role of RIPK1 in metabolic stress.

      Weaknesses:

      However, the findings are not novel enough because the pro-survival role of RIPK1 scaffold is well-established and several similar pieces of research already exist. Moreover, the mechanism is not very clear and needs additional experiments.

      We thank the reviewer for the encouraging comments and helpful suggestions. Here we conducted additional experiments including flow cytometry, western blotting, and using kinase-dead mutant mice to further investigate the underlying mechanisms. We carefully addressed every comment by the reviewer as indicated below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (7) I recommend that the authors consider reassessing their results, particularly with regards to elucidating the dialogue between macrophages and hepatocytes, as this could further strengthen the study's conclusions.

      Thank you very much for your constructive suggestion. We conducted additional experiments, including flow cytometry and western blotting, to reassess our findings. Furthermore, to clarify the interactions between cells, we employed CellChat for a more in-depth analysis of the single-cell sequencing results. In the revised manuscript version, changes are highlighted in yellow. In this study, we demonstrated that the specific deletion of RIPK1 in hepatocytes exacerbated the liver's vulnerability to metabolic disturbances, such as short-term fasting and high-fat diet feeding, resulting in increased liver damage, apoptosis, inflammation, and compensatory proliferation. The data indicate that fasting-induced liver injury in RIPK1 knockout mice of hepatic parenchymal cells may exacerbate the inflammatory response in liver tissue through enhanced SAA signaling. In summary, we revealed a novel physiological role of RIPK1 as a scaffold in maintaining liver homeostasis during fasting and other nutritional disturbances.

      (8) It would be beneficial for the authors to address the minor weaknesses identified in the study, such as ensuring consistency in the number of animals analyzed across different graphs and enhancing the resolution of images to improve data clarity.

      Thank you for the suggestion. In the revised manuscript, we have addressed these minor weaknesses, and we checked the consistency in the number of animals in different graphs, as well as enhanced the resolution of all images.

      (9) I encourage the authors to incorporate additional experiments, such as Western blotting and flow cytometry, to provide a more quantitative assessment of the observed effects and enhance the robustness of their conclusions.

      Thank you for your insightful suggestion. We completely agree with the reviewer that incorporating flow cytometry and western blotting would strengthen the robustness of our conclusions. We conducted flow cytometry analysis and western blotting and the results were listed in updated Supplementary Figure 1, Figure 2, Figure 4 and Supplementary Figure 4.

      (10) Furthermore, the authors may consider conducting complementary experiments, such as rescue experiments involving complementing hepatocytes with a kinase-dead version of RIPK1, to further refine the molecular aspect of the study and elucidate the specific roles of RIPK1's scaffolding or kinase function in response to starvation.

      Thank you very much for your constructive suggestion. As shown in updated Supplementary Figure 2, we conducted fasting experiments using RIPK1 kinase-dead mice. These findings suggest that the liver damage and inflammatory response induced by fasting in Ripk1<sup>Δhep</sup> mice may not contributed by the kinase activity of RIPK1.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (11) What is the upsteam signal for RIPK1? The study investigated the change induced by short-term fasting which is metabolic stress. Although RIPK1 knockout promotes cell death and inflammation, how it is involved in this condition is unclear. RIPK1 is never reported as a metabolic sensor and its function is typically downstream of TNFR1 as well as other death receptors such as Fas, TRAIL-R1, TRAIL-R2. Thus, it's probable that metabolic stress induces the expression and secretion of some ligand of the above receptors. Although TNFα expression is upregulated on both mRNA and protein levels, it could not be concluded that TNFα is the upsteam signal for RIPK1 because expression difference does not always lead to fuctional role. In addition, a recent study, which is also reference 33, reports that knockout of TNFR1/2 does not protect against 18 h liver ischemia, a condition that is similar to the present study. Therefore, the link between the metabolic fluctuation and RIPK1 function is elusive and should be addressed. The expression difference analysis should be extended to other relevant ligands. A functional study using neutralizing antibodies in RIPK1ΔHep mice is encouraged. At least, this should be discussed in the discussion section.

      Thank you very much for your insightful comments. The upstream signals of RIPK1 remains a significant area of scientific inquiry. Fasting, as one of the main causes of metabolic stress, is known to trigger a series of physiological changes, including but not limited to decreased blood glucose levels, hepatic glycogen depletion, increased production of hepatic glucose and ketone bodies, adipose tissue lipolysis, and the influx and accumulation of free fatty lipids in the liver. It is well-established that the elevated lipid influx and hepatic accumulation during fasting may cause lipotoxicity stress for liver. To investigate whether the elevated free fatty acids influx might act as the signal to induce cytotoxicity, we isolated primary hepatocytes but observed that a significant number of cells underwent spontaneous death during the isolation and perfusion processes. To address this question, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, as illustrated in Author response image 1A.

      To mimic hepatic lipid accumulation induced by short-term fasting, we treated the cells with palmitic acid (PA) or oleic acid (OA) for 12 hours in vitro. Our results indicated a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after PA treatment compared to WT control cells (Author response image 1B). As shown in Author response image 1C, we also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      Collectively, our results highlight the crucial role of RIPK1 in hepatocytes in maintaining the liver's adaptive capacity to counteract lipotoxicity induced by metabolic stress. These in vitro results were not included in the manuscript; however, we addressed them in the discussion section (line 593 - 597). If the reviewer suggest, we would like to incorporate in our manuscript.

      Author response image 1.

      (12) What is the exact relationship between ER stress and RIPK1? In Figure 5A and Figure 6B, Ripk1 knockout only slightly promotes the expression of ER stress markers. The evidence of RIPK1 leading to ER stress is limited in the literature and poorly supported in this study. Also in reference 33, the hypothesis is proposed that ER stress leads to death receptor upregulation and activation, which induces RIPK1 activation. Although the ER stress inhibitor showed good efficacy in rescue experiments, it could not determine whether RIPK1 deficiency leads to ER stress-associated phenotype or ER stress leads to death receptor activation and RIPK1 deficiency-associated phenotype. If RIPK1 deficiency leads to ER stress, the possible mechanism should be investigated.

      Thank you very much for your insightful comments. As the reviewer noted, the specific relationship between endoplasmic reticulum (ER) stress and RIPK1 remains unclear. However, our data, along with findings from other studies (Piccolis M et al., Mol Cell. 2019; Geng Y et al., Hepatol Int. 2021), suggest that fasting-induced lipolysis in peripheral adipose tissue leads to hepatic lipid accumulation. Additionally, excessive deposition of free fatty acids has been shown to induce ER stress in the liver. One possible explanation is that ER stress may trigger the upregulation and activation of death receptors, and the scaffold function of RIPK1 may play a protective and checkpoint role in this process. ER stress during the fasting might locate upstream of RIPK1. This could help explain why short-term fasting results in liver damage in Ripk1<sup>Δhep</sup> mice while control mice remain unaffected. Moreover, the inhibition of ER stress using 4-PBA can effectively alleviate this damage.

      Minor:  

      (13) The study starts directly from functional experiments. However, it should be firstly explored whether RIPK1 expression or activation is modulated in wild-type mice.

      Thank you very much for your insightful observation. Previous studies showed that RIPK1 deficiency in hepatocytes does not impact the growth and development of mice, indicating that RIPK1 is dispensable for proper liver development and homeostasis (Filliol A et al., Cell Death Dis. 2016). Furthermore, we did not observe any changes in RIPK1 levels in wild-type mice induced by fasting across different experimental batches. In our bulk transcriptomic analysis, the expression of RIPK1 was not changed before and after 12-hour fasting in Ripk1<sup>fl/fl</sup> mice. Therefore, we focused our attention on the function of RIPK1 and started our study directly with functional experiments.

      (14) Knockout of RIPK1 deprived both its scaffold function and kinase function. It is encouraged to explore whether blocking RIPK1 kinase activity influences the outcome of metabolic stress.

      Thank you for your insightful suggestion. To investigate the role of RIPK1 kinase activity in response to metabolic stress, we added fasting experiments using RIPK1 kinaseinactive mice in the updated Supplementary Figure 2, in which blocking RIPK1 kinase activity does not affect the outcome of metabolic stress.

      (15) In Figure 1, the number of TUNEL+ cells is about 2 times of c-casp3. What is the possible reason?

      Thank you for your careful reading. Indeed, the number of TUNEL<sup>+</sup> cells in Figure 1 is twice that of cleaved-caspase-3<sup>+</sup> cells. There are two possible reasons. First, we speculate that this discrepancy may be attributed to the higher sensitivity of the TUNEL assay compared to the cleaved-caspase-3 assay. Secondly, TUNEL assay detects DNA fragmentation, indicating that these cells are in a pre-apoptotic state or poised to undergo apoptosis. In contrast, cleaved-caspase-3 specifically identifies cells that have already committed to the apoptotic pathway, whereas TUNEL assay could detects all types of apoptosis, but the mechanisms of apoptosis may involve more than just cleaved-caspase3.

      (16) Infiltrated innate immune cells could lead to hepatocyte death. Is the hepatocyte death in this study partially caused by immune cells?

      Many thanks for the advice. As outlined in the response to the 11th comment from the second reviewer, our findings indicate that metabolic stress induced by short-term fasting is the primary cause of hepatocyte death. Additionally, we demonstrate that infiltrated innate immune cells may also play a partial role in hepatocyte death through subsequent cascade reactions.

      (17) Could the in vivo results be consolidated by in vitro experiments on primary mouse hepatocytes? This would be helpful to answer question 4.

      Thank you for your helpful comments. As demonstrated in the response to the 11th comment by the second reviewer, we attempted to conduct in vitro experiments using primary hepatocytes. However, during the isolation and perfusion processes, we observed that a significant number of cells underwent spontaneous death. To address this issue, we utilized CRISPR-Cas9 technology to generate Ripk1<sup>-/-</sup> AML12 cells, in which a significant increase in cell death among Ripk1<sup>-/-</sup> AML12 cells after palmitic acid (PA) treatment compared to WT control cells. We also observed a marked increase in caspase-3 activity in Ripk1<sup>-/-</sup> AML12 cells following PA treatment.

      (18) RIPK1 scaffold function is associated with NF-kB signal. Is NF-kB signal transduction influenced by Ripk1 deficiency? If so, to what extent does it contribute to the observed phynotype? If not, what is the direct downstream effect of Ripk1 deficiency?

      Thank you very much for your insightful perspective. As reported by Clucas J et al., RIPK1 serves as a scaffold for downstream NF-κB signaling through the ubiquitin chains generated by its ubiquitination (Clucas J et al., Nat Rev Mol Cell Biol. 2023). The deficiency of RIPK1 in hepatic parenchymal cells can disrupt NF-κB signaling and impair its pro-survival functions, resulting in increased cell death in response to stress. Our current findings suggest that the RIPK1-NF-κB axis serves as a crucial scaffold platform essential for the liver's adaptation to metabolic fluctuations. Any inappropriate inactivation or deletion of components within this scaffold disrupts the delicate balance between cell death, inflammation, and normal function, making the liver susceptible to metabolic changes, ultimately leading to liver damage, hepatic inflammation, and compensatory proliferation.

      (19) In Figure 6B, the 'RIP' should be changed to 'RIPK1'.

      Thank you for your careful observation. We have corrected "RIP" to "RIPK1" in updated Figure 6B.

      (20) For Western blot results, the blot height should be at least the lane width to reveal additional signals and the molecular weight as well as unspecific signals should be denoted.

      Thank you for your valuable advice. We appreciate your suggestions regarding the western blot results. We went through the previous western blot results and did not find any additional nonspecific signals. We added the molecular weights in the updated figures Figure 5, Figure 6 and Supplementary Figure 1.

    1. eLife Assessment

      This important collection of over 800 new cell type-specific driver lines will be an invaluable resource for researchers studying associative learning in Drosophila. Thoroughly characterized and well documented, this collection will permit researchers to selectively target neurons that deliver information to, or receive it from, the memory center of the fly brain called the Mushroom Body. Given the wealth of new drivers and the genetic access they provide to over 300 cell types, this compelling work will be of interest not only to researchers studying the mechanisms of associative learning but more generally to those dissecting sensorimotor circuits in the fly nervous system.

    2. Reviewer #1 (Public Review):

      Summary:

      The emergence of Drosophila EM connectomes has revealed numerous neurons within the associative learning circuit. However, these neurons are inaccessible for functional assessment or genetic manipulation in the absence of cell-type-specific drivers. Addressing this knowledge gap, Shuai et al. have screened over 4000 split-GAL4 drivers and correlated them with identified neuron types from the "Hemibrain" EM connectome by matching light microscopy images to neuronal shapes defined by EM. They successfully generated over 800 split-GAL4 drivers and 22 split-LexA drivers covering a substantial number of neuron types across layers of the mushroom body associative learning circuit. They provide new labeling tools for olfactory and non-olfactory sensory inputs to the mushroom body; interneurons connected with dopaminergic neurons and/or mushroom body output neurons; potential reinforcement sensory neurons; and expanded coverage of intrinsic mushroom body neurons. Furthermore, the authors have optimized the GR64f-GAL4 driver into a sugar sensory neuron-specific split-GAL4 driver and functionally validated it as providing a robust optogenetic substitute for sugar reward. Additionally, a driver for putative nociceptive ascending neurons, potentially serving as optogenetic negative reinforcement, is characterized by optogenetic avoidance behavior. The authors also use their very large dataset of neuronal anatomies, covering many example neurons from many brains, to identify neuron instances with atypical morphology. They find many examples of mushroom body neurons with altered neuronal numbers or mistargeting of dendrites or axons and estimate that 1-3% of neurons in each brain may have anatomic peculiarities or malformations. Significantly, the study systematically assesses the individualized existence of MBON08 for the first time. This neuron is a variant shape that sometimes occurs instead of one of two copies of MBON09, and this variation is more common than that in other neuronal classes: 75% of hemispheres have two MBON09's, and 25% have one MBON09 and one MBON08. These newly developed drivers not only expand the repertoire for genetic manipulation of mushroom body-related neurons but also empower researchers to investigate the functions of circuit motifs identified from the connectomes. The authors generously make these flies available to the public. In the foreseeable future, the tools generated in this study will allow important advances in the understanding of learning and memory in Drosophila.

      Strengths:

      (1) After decades of dedicated research on the mushroom body, a consensus has been established that the release of dopamine from DANs modulates the weights of connections between KCs and MBONs. This process updates the association between sensory information and behavioral responses. However, understanding how the unconditioned stimulus is conveyed from sensory neurons to DANs, and the interactions of MBON outputs with innate responses to sensory context remains less clear due to the developmental and anatomic diversity of MBONs and DANs. Additionally, the recurrent connections between MBONs and DANs are reported to be critical for learning. The characterization of split-GAL4 drivers for 30 major interneurons connected with DANs and/or MBONs in this study will significantly contribute to our understanding of recurrent connections in mushroom body function.

      (2) Optogenetic substitutes for real unconditioned stimuli (such as sugar taste or electric shock) are sometimes easier to implement in behavioral assays due to the spatial and temporal specificity with which optogenetic activation can be induced. GR64f-GAL4 has been widely used in the field to activate sugar sensory neurons and mimic sugar reward. However, the authors demonstrate that GR64f-GAL4 drives expression in other neurons not necessary for sugar reward, and the potential activation of these neurons could introduce confounds into training, impairing training efficiency. To address this issue, the authors have elaborated on a series of intersectional drivers with GR64f-GAL4 to dissect subsets of labeled neurons. This approach successfully identified a more specific sugar sensory neuron driver, SS87269, which consistently exhibited optimal training performance and triggered ethologically relevant local searching behaviors. This newly characterized line could serve as an optimized optogenetic tool for sugar reward in future studies.

      (3) MBON08 was first reported by Aso et al. 2014, exhibiting dendritic arborization into both ipsilateral and contralateral γ3 compartments. However, this neuron could not be identified in the previously published Drosophila brain connectomes. In the present study, the existence of MBON08 is confirmed, occurring in one hemisphere of 35% of imaged flies. In brains where MBON08 is present, its dendrite arborization disjointly shares contralateral γ3 compartments with MBON09. This remarkable phenotype potentially serves as a valuable resource for understanding the stochasticity of neurodevelopment and the molecular mechanisms underlying mushroom body lobe compartment formation.

    3. Reviewer #2 (Public Review):

      Summary:

      The article by Shuai et al. describes a comprehensive collection of over 800 split-GAL4 and split-LexA drivers, covering approximately 300 cell types in Drosophila, aimed at advancing the understanding of associative learning. The mushroom body (MB) in the insect brain is central to associative learning, with Kenyon cells (KCs) as primary intrinsic neurons and dopaminergic neurons (DANs) and MB output neurons (MBONs) forming compartmental zones for memory storage and behavior modulation. This study focuses on characterizing sensory input as well as direct upstream connections to the MB both anatomically and, to some extent, behaviorally. Genetic access to specific, sparsely expressed cell types is crucial for investigating the impact of single cells on computational and functional aspects within the circuitry. As such, this new and extensive collection significantly extends the range of targeted cell types related to the MB and will be an outstanding resource to elucidate MB-related processes in the future.

      Strengths:

      The work by Shuai et al. provides novel and essential resources to study MB-related processes and beyond. The resulting tools are publicly available and, together with the linked information, will be foundational for many future studies. The importance and impact of this tool development approach, along with previous ones, for the field cannot be overstated. One of many interesting aspects arises from the anatomical analysis of cell types that are less stereotypical across flies. These discoveries might open new avenues for future investigations into how such asymmetry and individuality arise from development and other factors, and how it impacts the computations performed by the circuitry that contains these elements.

    4. Reviewer #3 (Public Review):

      Summary:

      Previous research on the Drosophila mushroom body (MB) has made this structure the best-understood example of an associative memory center in the animal kingdom. This is in no small part due to the generation of cell-type specific driver lines that have allowed consistent and reproducible genetic access to many of the MB's component neurons. The manuscript by Shuai et al. now vastly extends the number of driver lines available to researchers interested in studying learning and memory circuits in the fly. It is an 800-plus collection of new cell-type specific drivers target neurons that either provide input (direct or indirect) to MB neurons or that receive output from them. Many of the new drivers target neurons in sensory pathways that convey conditioned and unconditioned stimuli to the MB. Most drivers are exquisitely selective, and researchers will benefit from the fact that whenever possible, the authors have identified the targeted cell types within the Drosophila connectome. Driver expression patterns are beautifully documented and are publicly available through the Janelia Research Campus's Flylight database where full imaging results can be accessed. Overall, the manuscript significantly augments the number of cell type-specific driver lines available to the Drosophila research community for investigating the cellular mechanisms underlying learning and memory in the fly. Many of the lines will also be useful in dissecting the function of the neural circuits that mediate sensorimotor circuits.

      Strengths:

      The manuscript represents a huge amount of careful work and leverages numerous important developments from the last several years. These include the thousands of recently generated split-Gal4 lines at Janelia and the computational tools for pairing them to make exquisitely specific targeting reagents. In addition, the manuscript takes full advantage of the recently released Drosophila connectomes. Driver expression patterns are beautifully illustrated side-by-side with corresponding skeletonized neurons reconstructed by EM. A comprehensive table of the new lines, their split-Gal4 components, their neuronal targets, and other valuable information will make this collection eminently useful to end-users. In addition to the anatomical characterization, the manuscript also illustrates the functional utility of the new lines in optogenetic experiments. In one example, the authors identify a specific subset of sugar reward neurons that robustly promotes associative learning.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Comments on revised version: 

      Overall, I thought the authors addressed my comments well with the possible exception of what is actually new here. This was the most important thing that I thought should be included in the revision. Although the authors rewrote the paragraph describing the lines presented in the paper, I still can't tell exactly which ones haven't been previously published. Their revised paragraph says that 40 lines have been "previously used," but Supplemental Table 1 shows references for over 200 of the lines, which sounds more reasonable based on papers that have come out. 

      We have modified the text in line 112-120 as below.

      “Supplementary File 1 lists 859 lines (including split-LexA) and their detailed information, such as genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type. A small subset of 47 lines from this collection have been previously used in studies (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      For 842 lines among the 859 lines listed in Supplementary File 1, this study is the primary citation for future papers for the following reason: 

      In 2021 December, we deposited the confocal images of new split-GAL4 lines at Janelia Flylight website (http://www.janelia.org/split-gal4) without a publication to describe annotation of expression patterns, and we already started sharing the lines without restrictions. In 2023 September, we released the preprint of this study at bioRxiv (doi: https://doi.org/10.1101/2023.09.15.557808). Up to this point, 47 lines have been used in other studies. In Supplementary File 1, 30 of them attribute the citation credit to both this study and other papers, because this 2023 preprint was cited as the primary citation in those papers. Similarly, the omni paper to summarize all the eWort of generating split-GAL4 lines by Janelia Flylight team (https://doi.org/10.7554/eLife.98405.1) cite many lines from this paper. However, since this summary paper did not provide additional information such as functional characterization by behavioral experiments, we did not include it in Supplementary File 1 to clarify that this study is the primary citation for these lines. The remaining 17 lines were published before 2021. We included them for the convenience of users, and we attributed the primary citation to the already published papers. 

      Also, in the revised paragraph they state that "All transgenic lines newly generated in this study are listed in Supplementary File 2" but that table lists only the 36 LexA hemidriver lines! Confusingly, this comment cites the same 8 references as are cited for the 40 line that they say were previously published. I am thus only more confused about how many previously uncharacterized lines are presented in this paper. 

      We modified the text as below to clarify that “new lines” indicate LexA or DBD lines but not new combination of already published AD and DBD lines. We removed the 8 citations, which were mistakenly placed in the previous manuscript.

      “The newly generated LexA, Gal4DBD and LexADBD lines are listed in Supplementary File 2. “

    1. eLife Assessment

      This paper presents useful results that extend our understanding of how the visual cortex encodes temporal structure, providing new information about sequence representations in the upper layers of the visual cortex. The evidence for prediction errors is solid, however, support for other claims regarding sparsification and simplification of activity following training is incomplete. The main concerns pertain to the confounds associated with restricted ordering within blocks that does not allow for separate plasticity mechanisms operating on different time scales.

    2. Reviewer #1 (Public review):

      Summary:

      Knudstrup et al. use two-photon calcium imaging to measure neural responses in the mouse primary visual cortex (V1) in response to image sequences. The authors presented mice with many repetitions of the same four-image sequence (ABCD) for four days. Then on the fifth day, they presented unexpected stimulus orderings where one stimulus was either omitted (ABBD) or substituted (ACBD). After analyzing trial-averaged responses of neurons pooled across multiple mice, they observed that stimulus omission (ABBD) caused a small, but significant, strengthening of neural responses but observed no significant change in the response to stimulus substitution (ACBD). Next, they performed population analyses of this dataset. They showed that there were changes in the correlation structure of activity and that many features about sequence ordering could be reliably decoded. This second set of analyses is interesting and exhibited larger effect sizes than the first results about predictive coding. However, concerns about the design of the experiment temper my enthusiasm.

      The most recent version of this manuscript makes a few helpful changes (entirely in supplemental figures--the main text figures are unchanged). It does not resolve any of the larger weaknesses of the experimental design, or even perform single-neuron tracking in the one case where it was possible (between similar FOVs shown in Supplemental Figure 1).

      Strengths:

      (1) The topic of predictive coding in the visual cortex is exciting, and this task builds on previous important work by the senior author (Gavornik and Bear 2014) where unexpectedly shuffling sequence order caused changes in LFPs recorded from visual cortex.

      (2) Deconvolved calcium responses were used appropriately here to look at the timing of the neural responses.

      (3) Neural decoding results showing that the context of the stimuli could be reliably decoded from trial-averaged responses were interesting. But I have concerns about how the data was formatted for performing these analyses.

      Weaknesses:

      (1) All analyses were performed on trial-averaged neural responses that were pooled across mice (except for Supplementary Figure 6, see below). Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform most analyses on individual datasets to assess how behavioral training might differently affect each animal.

      In the most recent draft, a single-mouse analysis was added for Figure 4C (Supplementary Figure 6). This effect of "representational drift" was not statistically quantified in either the single-mouse results or in the main text figure panel. Moreover, the apparent correlational drift could be accounted for by a reduction in SNR as a consequence of photobleaching.

      (2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

      In the most recent draft, this analysis was still not performed on single mice. I was referring to the "decorrelation of responses" analysis in Figure 3, not the "representational drift" analysis in Figure 4. See my comments on Supplementary Figure 6 above.

      (3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher order block structure of the task were disrupted.

      In the rebuttal letter for the most recent draft, the authors refer to recent work in press (Hosmane et al. 2024) suggesting that because sleep may be important for plastic changes between sessions, they do not expect much change to be apparent within a session. However, they admit that this current study is too underpowered to know for sure--and do not cite or mention this yet unpublished work in the manuscript itself.

      As a control, I would be interested to at least know how much variance in neural responses is observed between intermediate "training" sessions with identical stimuli, e.g. between Day 1 and Day 4, but this is not possible as imaging was not performed on these days.

      Despite being referred to as "similar" I do not think early and late responses are clearly shown--aside from the histograms comparing "early traces" to "all traces" which include early traces in Figure 5B and Figure 6A. Showing variance in single-cell responses would be helpful to add in Supplementary Figure 3 and Supplementary Figure 4.

      (4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

      In the most recent draft, this concern has not been mitigated. Despite Supplementary Figure 1 showing similar FOVs, mostly different neurons were still extracted. In all other sessions, it is not reported how far apart the other recorded FOVs were from each other.

      The rebuttal comment that the PE statistic is computed on an individual cell within-session basis is reasonable. Moreover, the bootstrapped version of the PE analysis in Supplementary Figure 8 is an improvement of the main analysis in the paper. As a control, it would have been helpful to compute the stability of the PE ratio statistics between training days (e.g. between day 1 and day 4). How much change would have been observed when none is expected? Unfortunately, imaging was not performed on these training days so this analysis will not be readily possible to perform. Moreover, the PE statistic requires averaging across cells and trials and is therefore very likely to wash out many interesting effects. Even if it is the population response that is changing, why would it be the arithmetic mean that changes in particular vs. some other projection of the population activity? The experimental and analysis design of the paper here remains weak in my mind.

    3. Reviewer #2 (Public review):

      Knudstrup and colleagues investigate response to short and rapid sequences of stimuli in layer 2/3 of mouse visual cortex. To quote the authors themselves: "the work continues the recent tradition of providing ambiguous support for the idea that cortical dynamics are best described by predictive coding models". Unfortunately, the ambiguity here is largely a result of the choice of experimental design and analysis, and the data provide only incomplete support for the authors' conclusions.

      The authors have addressed some of the concerns of the first revision. However, many still remain.

      (1) From the first review: "There appears to be some confusion regarding the conceptual framing of predictive coding. Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD not just positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in 3rd. Likewise, with ACBD, there is negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation."

      The authors acknowledge in their response that this is a problem, but do not appear to discuss this in the manuscript. This should be fixed.

      (2) From the first review: "Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded. "

      The authors respond by pointing out that what they mean by "drift" is within day changes. This has been clarified. However, the analyses in Figures 3 and 5 still are done across days. Figure 3: "Experience modifies activity in PCA space ..." and figure 5: "Stimulus responses shift with training". Both rely on comparisons of population activity across days. This concern remains unchanged here. It would probably be best to remove any analysis done across days - or use data where the same neurons were tracked. Performing chronic two-photon imaging experiments without tracking the same neurons is simply bad practice (assuming one intends to do any analysis across recording sessions).

      (3) From the first revision: "The block paradigm to test for prediction errors appears ill chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also the first few presentations after block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period. This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret."

      Again, the authors acknowledge the problem and state that "there is no indication that this is a learned effect". However, they provide no evidence for this and perform no analysis to mitigate the concern.

      (4) Some of the minor comments also appear unaddressed and uncommented. E.g. the response amplitudes are still shown in "a.u." instead of dF/F or z-score or spikes.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides insights into predictive coding models of visual cortex processing. These models predict that visual cortex neurons will show elevated responses when there are unexpected changes to learned sequential stimulus patterns. This model is currently controversial, with recent publications providing conflicting evidence. In this work, the authors test two types of unexpected pattern variations in layer 2/3 of the mouse visual cortex. They show that pattern omission evokes elevated responses, in favor of a predictive coding model, but find no evidence for prediction errors with substituted patterns, which conflicts with both prior results in L4, and with the expectations of a predictive coding model. They also report that with sequence training, responses sparsify and decorrelate, but surprisingly find no changes in the ability of an ideal observer to decode stimulus identity or timing.

      These results are an important contribution to the understanding of how temporal sequences and expectations are encoded in the primary visual cortex

      Comments on revisions:

      In this revision, the authors address several of the concerns in the original manuscript. However, the primary issue, raised by all three reviewers, was the block design of the experiments. This design makes disentangling the effects of any rapid (within block) plasticity from any longer term (across days) plasticity-which nominally is the subject of the paper-extremely difficult.

      Although it may be the case that re-running the experiments with an interleaved design is beyond the scope of this paper, unfortunately, the revised manuscript still does not adequately discuss this potential confound. The authors note that stimulus A in ABCD, ABBD, and ACBD could be distinguished on day 0, indicating that within block changes do occur. In both the original and revised manuscript this finding is discussed in terms of representational drift, but the authors fail to discuss how such within block plasticity may impact their primary findings of prediction error effects.

      This remains a significant concern with the revised manuscript.

      Many of the other issues in the original manuscript have been addressed, and in these areas the revised manuscript is both clearer and more accurately reflects the presented data. The additional analyses and controls shown in the supplemental figures aid in the interpretation of the findings.

    5. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      (1) All analyses were performed on trial-averaged neural responses that were pooled across mice. Owing to differences between subjects in behavior, experimental preparation quality, and biological variability, it seems important to perform at least some analyses on individual analyses to assess how behavioral training might differently affect each animal.

      In order to image at a relatively fast rate (30Hz) appropriate to the experimental conditions, we restricted our imaging to a relatively small field of view (412x412um with 512x512 pixels). This entails a smaller number of ROIs per animal, which can lead to an unbalanced distribution of cells responsive to different stimuli for individual fields-of-view. We used the common approach of pooling across animals (Homann et al., 2021; Kim et al., 2019) to overcome limitations imposed by sampling a smaller number of cells per animal. In response to this comment, we included supplemental analyses (Sup.Fig. 6) showing that representational drift (which was not performed on trial-averaged data) looks substantially the same (albeit nosier) for individual animals as at the population level. Additional analyses (PE ratio, etc.) were difficult since the distribution of cells selective for individual stimuli is unbalanced between individual animals and few mice have multiple cells representing all of the different stimuli.

      (2) The correlation analyses presented in Figure 3 (labeled the second Figure 2 in the text) should be conducted on a single-animal basis. Studying population codes constructed by pooling across mice, particularly when there is no behavioral readout to assess whether learning has had similar effects on all animals, appears inappropriate to me. If the results in Figure 3 hold up on single animals, I think that is definitely an interesting result.

      We repeated the correlation analysis performed on mice individually and included them in the supplement (Supp. Fig. 6). The overall result generally mirrors the result found by pooling across animals.

      (3) On Day 0 and Day 5, the reordered stimuli are presented in trial blocks where each image sequence is shown 100 times. Why wasn't the trial ordering randomized as was done in previous studies (e.g. Gavornik and Bear 2014)? Given this lack of reordering, did neurons show reduced predictive responses because the unexpected sequence was shown so many times in quick succession? This might change the results seen in Figure 2, as well as the decoder results where there is a neural encoding of sequence order (Figure 4). It would be interesting if the Figure 4 decoder stopped working when the higher-order block structure of the task was disrupted.

      Our work builds primarily on previous studies (Gavornik & Bear, 2014; Price et al., 2023) that demonstrated clear changes in neural responses over days while employing a similar block structure. Notably, Price et al. found that trial number (within a block) was not a significant factor in the generation of prediction-error responses which strongly suggests short-term plasticity does not play a significant role in shaping responses within the block structure. This finding is consistent with our previous LFP recordings which have not revealed any significant plasticity occurring within a training session, a conclusion bolstered by a collaborative work currently in press (Hosmane et al. 2024, Sleep) revealing the requirement for sleep in sequence plasticity expression.

      It is possible that layer 2/3 adapts to sequences more rapidly than layer 4/5. While manual inspection does not reveal an obvious difference between early and late blocks in this dataset, the n for this subset is too small to draw firm conclusions. It is our view that the block structure provides the strongest comparison to previous work, but agree it would be interesting to randomize or fully interleave sequences in future studies to determine what effect, if any, short-term changes might have. 

      (4) A primary advantage of using two-photon calcium imaging over other techniques like extracellular electrophysiology is that the same neurons can be tracked over many days. This is a standard approach that can be accomplished by using many software packages-including Suite2P (Pachitariu et al. 2017), which is what the authors already used for the rest of their data preprocessing. The authors of this paper did not appear to do this. Instead, it appears that different neurons were imaged on Day 0 (baseline) and Day 5 (test). This is a significant weakness of the current dataset.

      The hypothesis being tested was whether expectation violations, as described in Keller & Mrsic-Flogel 2018, exist under a multi-day sequence learning paradigm. For this, tracking cells across days is not necessary as our PE metric compared responses of individual neurons to multiple stimuli within a single session. Given the speed/FOV tradeoff discussed above, we wanted to consider all cells irrespective of whether they were visible/active or trackable across days, especially since we would expect cells that learn to signal prediction errors to be inactive on day 0 and not selected by our segmentation algorithm. Though we did not compare the responses of single cells before/after training, we did analyze cells from the same field of view on days 0 and 5 (see Supp.Fig. 1) and not distinct populations.

      Reviewer #2:

      (1) There appears to be some confusion regarding the conceptual framing of predictive coding.

      Assuming the mouse learns to expect the sequence ABCD, then ABBD does not probe just for negative prediction errors, and ACBD is not just for positive prediction errors. With ABBD, there is a combination of a negative prediction error for the missing C in the 3rd position, and a positive prediction error for B in the 3rd. Likewise, with ACBD, there is a negative prediction error for the missing B at 2nd and missing C at 3rd, and a positive prediction error for the C in 2nd and B in 3rd. Thus, the authors' experimental design does not have the power to isolate either negative or positive prediction errors. Moreover, looking at the raw data in Figure 2C, this does not look like an "omission" response to C, but more like a stronger response to a longer B. The pitch of the paper as investigating prediction error responses is probably not warranted - we see no way to align the authors' results with this interpretation.

      The reviewer has identified a real problem with the framing of “positive” and “negative” prediction errors in context of sensory stimuli where substitution simultaneously introduces unexpected “positive” violation and “negative” omission. Simply put, even if there are separate mechanisms to represent positive and negative errors, there may be no way to isolate the positive response experimentally since an unexpected input always replaces the unseen expected input. For example, had a cell fired solely to ACBD (and not during either ABCD or ABCD), then whether it was signaling the unexpected occurrence of C or the unexpected absence of B would be inherently ambiguous. In either case, such a cell would have been labeled as C-responsive, and its activity would have been elevated compared with ABCD and would have been included in our substitution-type analysis of prediction errors. We accept that there is some ambiguity regarding the description in this particular case, but overall, this cell’s activity pattern would have informed the PE analysis for which the result was essentially null for the substitution-type violation ACBD.

      Omission, in which the sensory input does not change, may experimentally isolate the negative response though this is only true if there is a temporal expectation of when the change should have occurred. If A is predicting B in an ordinal sense but there is no expectation of when B will occur with respect to A, changing the duration of A would not be expected to produce an error signal since at any point in time B might still be coming and the expectation is not broken until something other than B occurs. With respect specifically to ABBD in our experiments, it is correct that the learned error responses take the form of stronger, sustained responses to B during the time C was expected. This is still in contrast to day 0 in which activation decays after a transient response to ABBD. The data shows that responses during an omitted element are altered with training and take the form of elevated responses to ABBD on day 5.As we say in our discussion, this is somewhat ambiguous evidence of prediction errors since it does emerges only with training and is generally consistent with the hypothesis being tested though it takes a different form than we expected it to.

      (2) Related to the interpretation of the findings, just because something can be described as a prediction error does not mean it is computed in (or even is relevant to) the visual cortex. To the best of our knowledge, it is still unclear where in the visual stream the responses described here are computed. It is possible that this type of computation happens before the signals reach the visual cortex, similar to mechanisms predicting moving stimuli already in the retina (https://pubmed.ncbi.nlm.nih.gov/10192333/). This would also be consistent with the authors' finding (in previous work) that single-cell recordings in V1 exhibit weaker sequence violation responses than the author's earlier work using LFP recordings.

      Our work was aimed at testing the specific hypothesis that PE responses, at the very least, exist in L2/3—a hypothesis that is well-supported under different experimental paradigms (often multisensory mismatch). Our aim was to test this idea under a sequence learning paradigm and connect it with previously found PE responses in L4. We don’t claim that it is the only place in which prediction errors may be computed or useful, especially since (as you mentioned), there is evidence for such responses in layer 4. But it is fundamentally important to predictive processing that we determine whether PE responses can be found in layer 2/3 under this passive sequence learning paradigm, whether or not they reflect upstream processes, feedback from higher areas, or entirely local computations. Our aim was to establish some baseline evidence for or against predictive processing accounts of L2/3 activity during passive exposure to visual sequences.

      (3) Recording from the same neurons over the course of this paradigm is well within the technical standards of the field, and there is no reason not to do this. Given that the authors chose to record from different neurons, it is difficult to distinguish representational drift from drift in the population of neurons recorded.

      Our discussion of drift refers to changes occurring within a population of neurons over the course of a single imaging session. We have added clarifying language to the manuscript to make this clear. Changes to the population-level encoding of stimuli over days are treated separately and with different analytical tools. Re. tracking single across days, please see the response to Reviewer #1, comment 4.

      (4) The block paradigm to test for prediction errors appears ill-chosen. Why not interleave oddball stimuli randomly in a sequence of normal stimuli? The concern is related to the question of how many repetitions it takes to learn a sequence. Can the mice not learn ACBD over 100x repetitions? The authors should definitely look at early vs. late responses in the oddball block. Also, the first few presentations after the block transition might be potentially interesting. The authors' analysis in the paper already strongly suggests that the mice learn rather rapidly. The authors conclude: "we expected ABCD would be more-or-less indistinguishable from ABBD and ACBD since A occurs first in each sequence and always preceded by a long (800 ms) gray period.

      This was not the case. Most often, the decoder correctly identified which sequence stimulus A came from." This would suggest that whatever learning/drift could happen within one block did indeed happen and responses to different sequences are harder to interpret.

      This work builds on previous studies that used a block structure to drive plasticity across days. We previously tested whether there are intra-block effects and found no indication of changes occurring within a block or withing a session (please see the response to Reviewer #1, comment 3 for further discussion). Observed drift does complicate comparison between blocks. There is no indication in our data that this is a learned effect, though future experiments could test this directly.

      (5) Throughout the manuscript, many of the claims are not statistically tested, and where they are the tests do not appear to be hierarchical (https://pubmed.ncbi.nlm.nih.gov/24671065/), even though the data are likely nested.

      We have modified language throughout the manuscript to be more precise about our claims. We used pooled data between mice and common parametric statistics in line with published literature. The referenced paper offers a broad critique of this approach, arguing that it increases the possibility of type 1 errors, though it is not clear to us that our experimental design carries this risk particularly since most of our results were negative. To address the specific concern, however we performed a non-parametric hierarchical bootstrap analysis (https://pmc.ncbi.nlm.nih.gov/articles/PMC7906290/) that re-confirmed the statistical significance of our positive results, see Supplemental Figure 8.

      (6) The manuscript would greatly benefit from thorough proofreading (not just in regard to figure references).

      We apologize for the errors in the manuscript. We caught the issue and passed on a corrected draft, but apparently the uncorrected draft was sent for review. The re-written manuscript addresses all identified issues.

      (7) With a sequence of stimuli that are 250ms in length each, the use of GCaMP6s appears like a very poor choice.

      We started our experiments using GCaMP6f but ultimately switched to GCaMP6s due to its improved sensitivity, brightness, and accuracy in spike detection (Huang et al., 2021). When combined with deconvolution (Pachitariu et al., 2018; Pnevmatikakis et al., 2016), we found GCaMP6s provides the most complete and accurate view of spiking within 40ms time bins. The inherent limitations of calcium imaging are more likely to be addressed using electrophysiology rather than a faster sensor in future studies.

      (8) The data shown are unnecessarily selective. E.g. it would probably be interesting to see how the average population response evolves with days. The relevant question for most prediction error interpretations would be whether there are subpopulations of neurons that selectively respond to any of the oddballs. E.g. while the authors state they "did" not identify a separate population of omission-responsive neurons, they provide no evidence for this. However, it is unclear whether the block structure of the experiments allows the authors to analyze this.

      We concluded that there is no clear dedicated subpopulation of omission-responding cells by inspecting cells with large PE responses (i.e., ABBD, see supplemental figure 3). Out of the 107 B-responsive cells on day 5, only one appeared to fire exclusively during the omitted stimulus. Average traces for all B-responsive cells are included in the supplement and we have updated the manuscript accordingly. Similarly, a single C-responsive cell was found with an apparently unique substitution error profile (ABCD and ACBD , supplemental figure 4).

      Our primary concern was to make sure that days 0 and 5 had the highest quality fields-of-view. In work leading up to this study, there were concerns that imaging on all intermediate days resulted in a degradation of quality due to photobleaching. We agree that an analysis of intermediate days would be interesting, but it was excluded due to these concerns. 

      Reviewer #3:

      (1) Experimental design using a block structure. The use of a block structure on test days (0 and 5) in which sequences were presented in 100 repetition blocks leads to several potential confounds. First, there is the potential for plasticity within blocks, which could alter the responses and induce learned expectations. The ability of the authors to clearly distinguish blocks 1 and 2 on Day 0 with a decoder suggests this change over time may be meaningful.

      Repeating the experiments with fully interleaved sequences on test days would alleviate this concern. With the existing data, the authors should compare responses from the first trials in a block to the last trials in a block.

      This block design likely also accounts for the ability of a decoder to readily distinguish stimulus A in ABCD from A in ABBD. As all ABCD sequences were run in a contiguous block separate from ABBD, the recent history of experience is different for A stimuli in ABCD versus ABBD. Running fully interleaved sequences would also address this point, and would also potentially mitigate the impact of drift over blocks (discussed below).

      As described in other responses, the block structure was chosen to align more closely with previous studies. We take the overall point though, and future studies will employ the suggested randomized or interleaved structure in addition to block structures to investigate the effects of short-term plasticity.

      (2) The computation of prediction error differs significantly for omission as opposed to substitutions, in meaningful ways the authors do not address. For omission errors, PE compares the responses of B1 and B2 within ABBD blocks. These responses are measured from the same trial, within tens of milliseconds of each other. In contrast, substitution PE is computed by comparing C in ABCD to C in ACBD. As noted above, the block structure means that these C responses were recorded in different blocks, when the state of the brain could be different. This may account for the authors' detection of prediction error for omission but not substitution. To address this, the authors should calculate PE for omission using B responses from ABCD.

      We performed the suggested analysis (i.e., ABBD vs ABCD) prior to submission but omitted it from the draft for brevity (the effect was the same as with ABBD vs ABBD). We have added the results of standardizing with ABCD as supplementary figure 3.

      (3) The behavior of responses to B and C within the trained sequence ABCD differs considerably, yet is not addressed. Responses to B in ABCD potentiate from d0-> d5, yet responses to C in the same sequence go down. This suggests there may be some difference in either the representation of B vs C or position 2 vs 3 in the sequence that may also be contributing to the appearance of prediction errors in ABBD but not ACBD. The authors do not appear to consider this point, which could potentially impact their results. Presenting different stimuli for A,B,C,D across mice would help (in the current paper B is 75 deg and C is 165 deg in all cases). Additionally, other omissions or substitutions at different sequence positions should be tested (eg ABCC or ABDC).

      We appreciate the suggestion. Ideally, we could test many different variants, but practical concerns regarding the duration of the imaging sessions prevented us from testing other interesting variations (such as ABCC) in the current study. We are uncertain as to how we should interpret the overall depressed response to element C seen on day 5, but since the effect is shared in both ABCD and ACBD, we don’t think it affected our PE calculations. 

      (4) The authors' interpretation of their PCA results is flawed. The authors write "Experience simplifies activity in principal component space". This is untrue based on their data. The variance explained by the first set of PCs does not change with training, indicating that the data is not residing in a lower dimensional ("simpler") space. Instead, the authors show that the first 5 PCs better align with their a priori expectations of the stimulus structure, but that does not mean these PCs necessarily represent more information about the stimulus (and the fact that the authors fail to see an improvement in decoding performance argues against this case). Addressing such a question would be highly interesting, but is lacking in the current manuscript. Without such analysis, referring to the PCs after training as "highly discretized" and "untangled" are largely meaningless descriptions that lack analytical support.

      We meant the terms “simpler”, “highly-discretized”, and “untangled” as qualitative descriptions of changes in covariance structure that occurred despite the maintenance of overall dimensionality. As the reviewer notes, the obvious changes in PC space appear to have had practically no effect on decodability or dimensionality, and we found this surprising and worth describing.

      (5) The authors report that activity sparsifies, yet provide only the fraction of stimulus-selective cells. Given that cell detection was automated in a manner that takes into account neural activity (using Suite2p), it is difficult to interpret these results as presented. If the authors wish to claim sparsification, they need to provide evidence that the total number of ROIs drawn on each day (the denominator for sparseness in their calculation) is unbiased. Including more (or less) ROIs can dramatically change the calculated sparseness.

      The authors mention sparsification as contributing to coding efficiency but do not test this. Training a decoder on variously sized subsets of their data on days 0 and 5 would test whether redundant information is being eliminated in the network over training.

      First, we provide evidence for sparseness using a visual responsiveness metric in addition to stimulus-selectivity. Second, it is true that Suite2p’s segmentation is informed by activity and therefore may possibly omit cells with very minimal activity. However, we detected a comparable number of cells on day 5 (n=1500) to day 0 (1368). We reportedly roughly half as many cells are stimulus-selective on day 5 compared with day 0. In order for that to have been a result of biased ROI segmentation, we would have needed to have detected closer to 2600 cells on day 5 rather than 1500.  Therefore, we consider any bias in the segmentation to have had little effect on the main findings.

      (6) The authors claim their results show representational drift, but this isn't supported in the data. Rather they show that there is some information in the structure of activity that allows a decoder to learn block ID. But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc). To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 2-5). In the absence of this or other tests of representational drift over blocks, the authors should remove the statement that "These findings suggest that there is a measurable amount of representational drift".

      “To actually assess representational drift, the authors should directly compare representations across blocks (one could train a decoder on block 1 and test on blocks 25)”: This is the exact analysis that was performed. Additionally, our analysis of pairwise correlations directly measures representational drift.

      “But this does not show whether the actual stimulus representations change, and could instead reflect an unrelated artifact that changes over time (responsivity, alertness, bleaching, etc)”: We have repeated the decoder analysis using normalized population vectors (Supplementary Figure 5) which we believe directly addresses whether the observed drift is due to photobleaching or alertness that would affect the overall magnitudes of response vectors.

      Our analysis of block decoding reflects decoders trained on individual stimulus elements, and we show the average over all such decodings (we have clarified this in the text). For example, we trained a decoder on ABCD presentations from block 1 and tested only against ABCD from other blocks, which I believe is the test being suggested by the reviewer. Furthermore, we do show that representational similarity for all stimulus elements reduces gradually and more-or-less monotonically as the time between presentations increases. We believe this is a fairly straightforward test of representational drift as has been reported and used elsewhere (Deitch et al., 2021).

      (7) The authors allude to "temporal echoes" in a subheading. This term is never defined, or substantiated with analysis, and should be removed.

      We hoped the term ‘temporal echo’ would be understood in the context of rebounding activity during gray periods as supported by analysis in figure 6a. We have eliminated the wording in the updated manuscript.

    1. eLife Assessment

      This paper addresses an important topic (normative trajectory modelling), seeking to provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on a pre-trained population normative model. The evidence provided for the new methods is solid.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.

      Strengths:

      The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors provide a method aiming to accurately reflect the individual deviation of longitudinal/temporal change compared to the normal temporal change characterized based on pre-trained population normative model (i.e., a Bayesian linear regression normative model), which was built based on cross-sectional data. This manuscript aims at solving a recently identified problem of using normative models based on cross-sectional data to make inferences about longitudinal change.

      Strengths:

      The efforts of this work make a good contribution to addressing an important question of normative modeling. With the greater availability of cross-sectional studies for normative modeling than longitudinal studies, and the inappropriateness of making inferences about longitudinal subject-specific changes using these cross-sectional data-based normative models, it's meaningful to try to address this gap from the aspect of methodological development.

      In the 1st revision, the authors added a simulation study to show how the performance of the classification based on z-diff scores relatively changes with different disruptions (and autocorrelation). Unfortunately, in my view this is insufficient as it only shows how the performance of using z-diff score relatively changes in different scenarios. I would suggest adding the comparison of performance to using the naïve difference in two simple z-scores to first show its better performance, which should also further highlight the inappropriate use of simple z-scores in inferring within-subject longitudinal changes.

      Thank you for the suggestion for additional comparison, which we have now implemented in the simulated methods comparison, see Figure 2 and the extended text of Section 2.1.4 Simulation study.

      Specifically, we have revised the simulation section to not only illustrate the performance of our z-diff method under various scenarios but also to include a direct comparison with a naïve approach that subtracts two z-scores.

      The updated results demonstrate that, compared to the naïve method, the z-diff score consistently maintains a fixed false-positive rate, making it a more robust and controllable approach. Additionally, we show that under conditions of high autocorrelation, the z-diff method is significantly more sensitive in detecting smaller changes than the subtraction method. Importantly, our analysis of a sample from our dataset indicates that high autocorrelation is a prevalent characteristic in real-world data, further supporting the utility of the z-diff method.

      We believe that these findings strengthen the case for adopting the z-diff method and underscore the limitations of more intuitive approaches, which, while simple, lack mathematical rigour.

      Additionally, Figure 1 is hard to read and obtain the actual values of the performance measure. I would suggest reducing it to several 2-dimensional figures. For example, for several fixed values of rho, how the performance changes with different values of the true disruption (and also adding the comparison to the naïve method (difference in two z-scores)).

      We believe that the Reviewer meant Figure 2; indeed, the 3-dimensional visualization, while attractive to some, may have been difficult to read, so we have now replaced it with several 2-dimensional figures as requested.

      I would also suggest changing the title to reflect that the evaluation of "intra-subject" longitudinal change is the method's focus.

      Thanks for the suggestion. We have now implemented it by changing the title to Using normative models pre-trained on cross-sectional data to evaluate intra-individual longitudinal changes in neuroimaging data.

      We hope the changes implemented fulfill the expectations of the Reviewer.

    1. eLife Assessment

      Yonk and colleagues provide a valuable, timely, and in-depth study showcasing the role of thalamostriatal inputs in learning and action selection. After characterizing the synaptic properties of these inputs onto different striatal cell types in vitro, they provide solid evidence that posterior medial thalamic nucleus (POm) terminals in striatum are activated during reward expectation and arousal. The overall function of this pathway and the degree to which results are confounded by viral contamination of surrounding nuclei and movements remain open questions.

    2. Reviewer #1 (Public review):

      Summary:

      This work aims at understanding the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow to conclude if more neurones are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity in average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neurons or if new neurons are progressively recruited in the process.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figure 1 and 2 are very clean and consistent with POm injection.

      Comment after review: The weaknesses remain as concerns have not been addressed. The dataset is interesting but the interpretation, due partly to the lack of control (especially relative to VPM contamination), is difficult.

    3. Reviewer #2 (Public review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      Comments on revisions:

      The revision has a lot of thoughtful discussion added. I think overall the paper is more thorough and will also be a nice set up for a number of future research questions.

    4. Reviewer #3 (Public review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analysis of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      Comments on revisions:

      The authors have only tweaked the Discussion and not necessarily in ways that addressed our previous comments. They could have fairly easily analyzed the effect of distance of recording from injection site and compared subsets of data depending on contamination beyond PO (my comments 1 and 2) or effects of movements (3 and 4). Minimally, they could have given caveats in the Results and Discussion about these, and I would strongly encourage them to be explicit about the caveats. The analyses would probably be better.

      The suggestion that the effects have something to do with priming (5), seems a grasp for function of the circuit.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work aims to understand the role of thalamus POm in dorsal lateral striatum (DLS) projection in learning a sensorimotor associative task. The authors first confirm that POm forms "en passant" synapses with some of the DLS neuronal subtypes. They then perform a go/no-go associative task that consists of the mouse learning to discriminate between two different textures and to associate one of them with an action. During this task, they either record the activity of the POm to DLS axons using endoscopy or silence their activity. They report that POm axons in the DLS are activated around the sensory stimulus but that the activity is not modulated by the reward. Last, they showed that silencing the POm axons at the level of DLS slows down learning the task.

      The authors show convincing evidence of projections from POm to DLS and that POm inputs to DLS code for whisking whatever the outcome of the task is. However, their results do not allow us to conclude if more neurons are recruited during the learning process or if the already activated fibres get activated more strongly. Last, because POm fibres in the DLS are also projecting to S1, silencing the POm fibres in the DLS could have affected inputs in S1 as well and therefore, the slowdown in acquiring the task is not necessarily specific to the POm to DLS pathway.

      We thank the reviewer for these constructive comments. The points are addressed below.  

      Strengths:

      One of the main strengths of the paper is to go from slice electrophysiology to behaviour to get an in-depth characterization of one pathway. The authors did a comprehensive description of the POm projections to the DLS using transgenic mice to unambiguously identify the DLS neuronal population. They also used a carefully designed sensorimotor association task, and they exploited the results in depth.

      It is a very nice effort to have measured the activity of the axons in the DLS not only after the mice have learned the task but throughout the learning process. It shows the progressive increase of activity of POm axons in the DLS, which could imply that there is a progressive strengthening of the pathway. The results show convincingly that POm axons in the DLS are not activated by the outcome of the task but by the whisker activity, and that this activity on average increases with learning.

      Weaknesses:

      One of the main targets of the striatum from thalamic input are the cholinergic neurons that weren't investigated here, is there information that could be provided?

      This is true of the parafascicular (Pf) thalamic nucleus, which has been well studied in this context. However, there is much less known about the striatal projections of other thalamic nuclei, including POm, and their inputs to cholinergic neurons. Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little of no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here would be an important area of further study.

      It is interesting to know that the POm projects to all neuronal types in the DLS, but this information is not used further down the manuscript so the only take-home message of Figure 1 is that the axons that they image or silence in the DLS are indeed connected to DLS neurons and not just passing fibres. In this line, are these axons the same as the ones projecting to S1? If this is the case, why would we expect a different behaviour of the axon activity at the DLS level compared to S1?

      Tracing of single POm axons by Ohno et al. (2012) indicated that POm axons form a branched collateral that innervates striatum, while the main axon continues in the rostral-dorsal direction to innervate cortex. We think it is reasonable, based on the morphology, that our optogenetic suppression experiment restricted the suppression of glutamate release to this branch and avoided the other branches of the axon that project to cortex. However, testing this would require monitoring S1 activity during the POm-striatal axon suppression, which we did not do in this study.

      It is a very interesting question whether there could be different axon activity behavior in striatum versus S1. There is surprising evidence that POm synaptic terminals are different sizes in S1 and M1 and show different synaptic physiological properties depending on these cortical projection targets (Casas-Torremocha et al., 2022). Based on this, it is possible that POm-striatal synapses show distinct properties compared to cortex; however, this will need to be tested in future work.

      The authors used endoscopy to measure the POm axons in the DLS activity, which makes it impossible to know if the progressive increase of POm response is due to an increase of activity from each individual neuron or if new neurons are progressively recruited in the process.

      This is a good point. It would be necessary to perform chronic two-photon imaging of POm neurons (or chronic electrophysiological recordings) to determine whether the activity of individual neurons increased versus whether individual neuron activity levels remained similar but new neurons became active with learning. Even under baseline conditions, it is not known in detail what fraction of the population of POm neurons is active during sensory processing or behavior, highlighting how much is still to be discovered in this exciting area of neuroscience.

      The picture presented in Figure 4 of the stimulation site is slightly concerning as there are hardly any fibres in neocortical layer 1 while there seems to be quite a lot of them in layer 4, suggesting that the animal here was injected in the VB. This is especially striking as the implantation and projection sites presented in Figures 1 and 2 are very clean and consistent with POm injection.

      Although this image was selected to demonstrate the position of the POm injection site and optical fiber implant above striatal axons, the reviewer is correct that there appears to be mixed labeling of axons in L4 and L5a. In some cases, there was expression slightly outside the border of POm (see Fig. 1B, right), which might explain the cortical innervation pattern in this figure. While cortically bound VPM axons pass through the striatum, they do not form synaptic terminals until reaching the cortex (Hunnicutt et al., 2016). If, as may be the case, inhibitory opsins suppress release of neurotransmitter at synaptic terminals more effectively than action potential propagation in axons, it may be likely that optogenetic suppression of POm-striatal terminals is more effective than suppression of action potentials in off-target-labelled VPM axons of passage. Ideally, we could compare effects of suppression of POm-striatal synapses with POm-cortical synapses and VPM-cortical synapses, but this was outside the bandwidth of the present study.

      Reviewer #2 (Public Review):

      Summary:

      Yonk and colleagues show that the posterior medial thalamus (POm), which is interconnected with sensory and motor systems, projects directly to major categories of neurons in the striatum, including direct and indirect pathway MSNs, and PV interneurons. Activity in POm-striatal neurons during a sensory-based learning task indicates a relationship between reward expectation and arousal. Inhibition of these neurons slows reaction to stimuli and overall learning. This circuit is positioned to feed salient event activation to the striatum to set the stage for effective learning and action selection.

      Strengths:

      The results are well presented and offer interesting insight into an understudied thalamostriatal circuit. In general, this work is important as part of a general need for an increased understanding of thalamostriatal circuits in complex learning and action selection processes, which have generally received less attention than corticostriatal systems.

      Weaknesses:

      There could be a stronger connection between the connectivity part of the data - showing that POm neurons context D1, D2, and PV neurons in the striatum but with some different properties - and the functional side of the project. One wonders whether the POm neurons projecting to these subtypes or striatal neurons have unique signaling properties related to learning, or if there is a uniform, bulk signal sent to the striatum. This is not a weakness per se, as it's reasonable for these questions to be answered in future papers.

      We are very interested to understand the potentially distinct learning-related synaptic and circuit changes that potentially occur at the POm synapses with D1- and D2-SPNs and PV interneurons, and other striatal cell types. We agree that this would be an important topic for further investigation.

      All the in vivo activity-related conclusions stem from data from just 5 mice, which is a relatively small sample set. Optogenetic groups are also on the small side.

      We appreciate this point and agree that higher N can be important for observing robust effects. A factor of our experiments that helped reduce the number of animals used was the longitudinal design, with repeated measures in the same subjects. This allowed for the internal control of comparing learning effects in the same subject from naïve to expert stages and therefore increased robustness. Even with relatively small group sizes, results were statistically significant, suggesting that the use of more mice was unnecessary, which we considered consistent with best practice in the use of animals in research. We also note that our group sizes were consistent with other studies in the field.  

      Reviewer #3 (Public Review):

      Yonk and colleagues investigate the role of the thalamostriatal pathway. Specifically, they studied the interaction of the posterior thalamic nucleus (PO) and the dorsolateral striatum in the mouse. First, they characterize connectivity by recording DLS neurons in in-vitro slices and optogenetically activating PO terminals. PO is observed to establish depressing synapses onto D1 and D2 spiny neurons as well as PV neurons. Second, the image PO axons are imaged by fiber photometry in mice trained to discriminate textures. Initially, no trial-locked activity is observed, but as the mice learn PO develops responses timed to the audio cue that marks the start of the trial and precedes touch. PO does appear to encode the tactile stimulus type or outcome. Optogenetic suppression of PO terminals in striatum slow task acquisition. The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing.

      A great strength of this paper is its timeliness. Thalamostriatal processing has received almost no attention in the past, and the field has become very interested in the possible functions of PO. Additionally, the experiments exploit multiple cutting-edge techniques.

      There seem to be some technical/analytical weaknesses. The in vitro experiments appear to have some contamination of nearby thalamic nuclei by the virus delivering the opsin, which could change the interpretation. Some of the statistical analyses of these data also appear inappropriate. The correlative analysis of Pom activity in vivo, licking, and pupil could be more convincingly done.

      The bigger weakness is conceptual - why should striatal circuitry need "priming" by the thalamus in order to process sensory stimuli? Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? It is unclear from the experiments that the thalamostriatal pathway exists for priming sensory processing. In fact, the optogenetic suppression of the thalamostriatal pathway seems to speak against that idea.

      We thank the reviewer for these constructive comments. The points are addressed below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Do POm neurons innervate CINs also? The connection between the PF thalamus and CINs is mentioned in a couple of places - one question is how unique are the input patterns for the POm versus adjacent sensorimotor thalamic regions, including the PF? This isn't a weakness per se but knowing the answer to that question would help in forming a more complete picture of how these different thalamostriatal circuits do or do not contribute uniquely to learning and action selection.

      Anatomical tracing evidence from Klug et al. (2018), which mapped brain-wide inputs to striatal cholinergic (ChAT) interneurons, suggests that Pf provides the majority of thalamic innervation of striatal ChAT neurons compared to other thalamic nuclei. Many other thalamic nuclei, including POm, showed very little or no labeling, suggesting weak innervation of ChAT interneurons. However, it is possible that these thalamic nuclei, including POm, do provide functional innervation of ChAT interneurons that is not sufficiently assessed by anatomical tracing.

      Another difference between Pf and other thalamic nuclei (likely including POm) comes from anatomical tracing evidence (Smith et al., 2014; PMID: 24523677) which indicates that Pf inputs form the majority of their synapses onto dendritic shafts of SPNs, while other thalamic nuclei form synapses onto dendritic spines. Understanding the innervation patterns of POm-striatal projections beyond the three cell types we have studied here, including ChAT neurons and subcellular localization, would be an important area of further study.

      It would be useful to know to what extent these POm-striatum neurons are activated generally during movement, versus this discrimination task specifically.

      We agree that distinguishing general movement-related activity from task-specific activity would be very useful. Earlier work (Petty et al., 2021) showed a close relationship between POm neuron activity, spontaneous (task-free) whisker movements, and pupil-indexed arousal in head-restrained mice. Oram et al. (2024; PMID: 39003286) recently recorded VPM and POm in freely moving mice during natural movements, finding that activity of both nuclei correlated with head and whisker movements. These studies indicate that POm is generally coactive with exploratory head and whisker movements.

      During task performance, the situation may change with training and attentional effects. For example, Petty and Bruno (2024) (https://elifesciences.org/reviewed-preprints/97188) showed that POm activity correlates more closely with task demands than tactile or visual stimulus modality. Our data indicate that POm axonal signals are increased at trial start during anticipation of tactile stimulus delivery and through the sensory discrimination period, then decrease to baseline levels during licking and water reward collection (Fig. 3). Results of Petty and Bruno (2024) together with ours suggest that POm is particularly active during the context of behaviorally relevant task performance. Thus, we think it is likely that, while pupil dilation indexes general movement and arousal, POm activity is more specific to movement and arousal associated with task engagement and behavioral performance. We have strengthened this point in the Discussion.

      Many of the data panels and text for legends/axes are quite small, and the stroke on line art is quite faint - overall figures could be improved from a readability standpoint.

      We thank the reviewer for their careful attention to the figures. 

      Reviewer #3 (Recommendations For The Authors):

      Major

      (1) Page 4, the Results regarding PSP and distance from injection site. The r-squared is the wrong thing to look at to test for a relationship. One should look at the p-value on the coefficient corresponding to the slope. The p-value is probably significant given the figures, in which case there may be a relationship contrary to what is stated. All the low r-squared value says is that, if there is a relationship, it does not explain a lot of the PSP variability.

      We thank the reviewer for alerting us this oversight. We have included the p value (p = 0.0293) in the figure and legend, and indicated that the relationship is “small but significant”.

      (2) Figure 1B suggests that the virus injections extend beyond POm and into other thalamic structures. Do any of the results change if the injections contaminating other nuclei are excluded from the analysis? I am not suggesting the authors change the figures/analyses. I am simply suggesting they double-check.

      We selected for injections that were predominantly expressing in POm as determined by post-hoc histological analysis (see Fig. 1, right). As above, we think that axons of passage that do not form striatal synapses are less likely to be suppressed than axons with terminals; however, this would need to be determined in further experiments. Because the preponderance of expression is within POm, we think the results would be similar even with a stricter selection criterion. 

      (3) The authors conclude that POm and licking are not correlated (bottom of page 6 pertaining to Figures 3A-F). The danger of these analyses is that they assume that GCaMP8 is a perfect linear reporter of POm spikes. The reliability of GCaMP8 has been quantified in some cell types, but not thalamic neurons, which have relatively higher firing rates.

      The reviewer is correct that the relationship between GCaMP8 fluorescence changes and spiking has not been sufficiently characterized in thalamic neurons, and that this would be important to do.

      What if the indicator is simply saturated late into the trial (after the average reaction time)? It would look like there is no response and one would conclude no correlation, but there could be a very strong correlation.

      While saturation is worthy of concern, the signal dynamics here argue against this possibility. The reason is that the signal increased in the early part of the trial and decreased by the end. If saturation was an issue, this would have been apparent during the initial increase. When the signal decreased in amplitude at the end of the trial, this indicates that the signal is not saturated because it is returning from a point closer to its maximum (and is becoming less saturated).

      Also, what happens between trials? Are the correlations the same, stronger, weaker? Ideally, the authors would analyze the data during and between trials.

      Between trials the signal did not show further changes in baseline beyond what was displayed at the start and end of behavioral trials. There were no consistent increases or decreases in signals between trials, except perhaps during strong whisking bouts. This is anecdotal because we did not analyze between-trial data. However, it is interesting and important to note that signals increased dramatically in amplitude from naïve, early learning to expert behavioral performance (Fig. 3), highlighting that POm-axonal signals relate to behavioral engagement and performance rather than spontaneous behaviors.  

      (4) Axonal activity could also appear more correlated with the pupil than licking because pupil dynamics are slow like the dynamics of calcium indicators. These kernels could artificially inflate the correlation. Ideally, the authors could consider these temporal effects. Perhaps they could deconvolve the temporal profiles of calcium and pupil before correlating? Or equivalently incorporate the profiles into their analysis?

      We analyzed the lick probability histograms, which had a temporal profile similar to the calcium signals (Fig. 3D,E), ruling out concerns about effects of temporal effects on correlations. It is also worth noting that we observed changes in correlations between calcium signals and pupil with learning stage (Fig. 3I), even though the temporal profiles (signal dynamics) are not changing. Thus, temporal effects of the signals themselves are not the driver of correlations, but rather the changes in relative timing between calcium signals and pupil, as occur with learning.

      (5) The authors conclude that PO provides a "behaviorally relevant arousal-related signal" and that this signal "primes" striatal circuitry for sensory processing. The data here support the first part. It is not clear that the data support the second part, largely because it is vague what "priming" of sensory processing or "a key role in the initial stages of action selection (p.9) even means here. Why would such circuitry even be necessary? Why is a sensory signal from the cortex insufficient? Why should the animal more slowly learn the task? How does this fit with existing ideas of striatal plasticity? Some conceptual proposals from the authors, even if speculative and not offered as a conclusion, would be helpful.

      We appreciate these good points and have added further consideration and revision of the concept of priming and potential roles in an extensively revised Discussion section.

      (6) The photometry shows that PO turns on about 2 seconds before the texture presentation. PO's activity seems locked to the auditory cue, not the texture (Figure 2). This means that the attempt to suppress the thalamostriatal pathway with JAWS (Figure 4) is rather late, isn't it? Some PO signals surely go through. This seems to contradict the idea of priming above. It would be good if the authors could factor this into their narrative. Perhaps labelling the time of the auditory cue in Figure 4C would also be helpful.

      The start of texture presentation (movement of the texture panel toward the mouse) and auditory cue occur at the same time. To clarify this, we added a label “start tone” in Figure 4C and also in Figure 2C.

      For optogenetic (JAWS) suppression, we intentionally chose a time window between start tone onset and texture presentation, because our photometry experiments showed that this was when the preponderance of the signal occurred. However, the reviewer is correct that our chosen optogenetic suppression (JAWS) onset occurs shortly after the photometry signal has already started, potentially leaving the early photometry signal un-suppressed. Our motivation for choosing a restricted time window surrounding the texture presentation time was 1) to minimize illumination and potential heating of brain tissue; 2) to target a time window that avoids the auditory cue but covers stimulus presentation. We did not want to extend the duration of the suppression to before the trial started, because this could produce task-non-specific effects, such as distraction or loss of attention before the start of the trial.

      Even if some signal were getting through before suppression, we don’t think this contradicts the possibility of ‘priming’, because the process underlying priming would still be disrupted even if not totally suppressed. This would alter the temporal relationship between POm-striatal inputs and further corticostriatal inputs (from S1 and M1 cortex, for example). We have included further consideration of these points and possible relation to the priming concept in the Discussion.

      Minor

      (1) Page 5, "the sensitivity metric is artificially increased". What do you mean "artificially"? The mice are discriminating better. It is true that either a change in HR or FAR can cause the sensitivity metric to change, but there is nothing artificial or misleading about this.

      We removed the word artificial and clarified our definition of behaviorally Expert in this context:

      “Mice were considered Expert once they had reached ≥ 0.80 Hit Rate and ≤ 0.30 FA Rate for two consecutive sessions in lieu of a strict sensitivity (d’) threshold; we found this definition more intuitive because d’ is enhanced as Hit Rate and FA Rate approach their extremes (0 or 1)”

      (2) Page 7, "Upon segmentation (Figure S4G-J)". Do you mean "segregation by trial outcome"?

      Corrected.

      (3) Page 9, "POm projections may have discrete target-specific functions, such that POm-striatal inputs may play a distinct role in sensorimotor behavior compared to POm-cortical inputs". Would POm-cortical inputs not also be sensorimotor? The somatosensory cortex contains a lot of corticostriatal cells. It also has various direct and indirect links to the motor cortex as well.

      We have clarified the wording here to convey the possibility that POm signals could be received and processed differently by striatal versus cortical circuitry, and have moved this statement to later in the discussion for better elaboration.

      (4) The Methods state that male and female mice were used. Why not say how many of each and whether or not there are any sex-specific differences?

      We added the following information to the Methods:

      The number of male and female mice were as follows, by experiment type: 6 male, 4 female (electrophysiology); 3 male, 2 female (fiber photometry); 4 male, 5 female (optogenetics). Data were not analyzed for sex differences.

    1. eLife Assessment

      Somatostatin-expressing neurons of the entopeduncular nucleus (EPNSst+) provide a limbic output of the basal ganglia and co-release GABA and Glutamate in their projection to the lateral habenula, a structure that is key for reward-based learning. Combining fiber photometry and computational modeling, the authors provide compelling evidence that EPNSst+ neural activity represents movement, choice direction and reward outcomes in a probabilistic switching task but, surprisingly, neither chronic genetic silencing of these neurons nor selectively elimination glutamate release affected behavioral performance in well-trained animals. This valuable study shows that despite its representation of key task variables, EPNSst+ neurons are dispensable for ongoing performance in a task requiring outcome monitoring to optimize reward. This work will be of interest to those interested in neural circuits, learning, and/or decision making.

    2. Reviewer #1 (Public review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modelling to demonstrate that EPNSst+ neural activity represents movement, choice direction and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modelling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results. To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Fig 1 F-J, Supp 1 F,K etc i.e via inclusion of faint hash lines connecting individual data points across variables. Additionally, Fig 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms. Finally, the authors claimed calcium activity increased following ipsilateral movements. However, figure 3C clearly shows that both SXcontra and SXisi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point? Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high. Vs low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results are striking. However, this relies on co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected. Alternative means of glutamate packaging (other VGluT isoforms, other transporters etc) could also compensate for the partial absence of VGluT2, which should be discussed. The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitations is transparently discussed by the authors.

      Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduces generalizability and to some extent the impact of their findings.

      Comments on the latest version:

      Specifically, they have included more thorough analyses to address several concerns related to interpreting activity patterns of EPSst+ neurons. The authors clearly point out that calcium activity increased during ipsilateral movements, and the increase was statistically larger during the choice phase (Figure 2 supplement 1F-G), indicating that these neurons may represent movement and additional factors (e.g. executive decision-making). Correspondingly, we appreciate the thorough explanation of using a GLM model to determine which behavioural variables contribute to observed physiological signals and adding the example reconstructed signal with direction and reward variables omitted in Figure 3 supplements 1 and 2.

      Although no new manipulation experiment is added to the manuscript, the authors respond to common critiques related to testing the behavioural effect after the manipulations in well-trained mice. The discussion related to technical limitations, possible compensatory mechanisms and alternative interpretations is thorough and overall satisfying. Based on the behaviour modeling results, the authors speculate that animals need to integrate more evidence from the past to guide choice in a more uncertain environment (70/30 version), instead of adopting a 'win-stay, lose-shift' strategy in the more deterministic 90/10 version. The authors expand the discussion, but the possibility that EPNSst+ neurons contribute to task performance in well-trained animals under uncertainty is not directly tested. Along with other alternative explanations discussed in the manuscript, we think the paper is valuable literature for future studies to understand the basal ganglia circuits in learning and decision-making.

    3. Reviewer #2 (Public review):

      Summary:

      This paper aimed to determine the role EP sst+ neurons play in a probabilistic switching task.

      Strengths:

      - The in vivo recording of the EP sst+ neurons activity in the task is one of the strongest parts of this paper. Previous work had recorded from the EP-LHb population in rodents and primates in head fixed configurations, the recordings of this population in a freely moving context is a valuable addition to these studies and has highlighted more clearly that these neurons respond both at the time of choice and outcome.

      - The use of a refined intersectional technique to record specifically the EP sst+ neurons is also an important strength of the paper. This is because previous work has shown that there are two genetically different types of glutamatergic EP neurons that project to the LHb. Previous work had not distinguished between these types in their recordings so the current results showing that the bidirectional value signaling is present in the EP sst+ population is valuable.

      Weaknesses:

      - One of the main weaknesses of the paper is to do with how the effect of the EP sst+ neurons on the behavior was assessed.

      o All the manipulations (blocking synaptic release and blocking glutamatergic transmission) are chronic and more importantly the mice are given weeks of training after the manipulation before the behavioral effect is assessed. This means that as the authors point out in their discussion the mice will have time to adjust to the behavioral manipulation and compensate for the manipulations. The results do show that mice can adapt to these chronic manipulations and that the EP sst+ are not required to perform the task. What is unclear is whether the mice have compensated for the loss of EP sst+ neurons and whether they play a role in the task under normal conditions. Acute manipulations or chronic manipulations without additional training would be needed to assess this.

      o Another weakness is that the effect of the manipulations was assessed in the 90/10 contingency version of the task. Under these contingencies, mice integrate past outcomes over fewer trials to determine their choice and animals act closer to a simple win-stay-lose switch strategy. Due to this it is unclear if the EP sst+ neurons would play a role in the task when they must integrate over a larger number of conditions in the less deterministic 70/30 version of the task. Indeed it is not clear that lesioning any other regions involved in evaluation of action outcomes such as VTA dopamine neurons, that encode reward prediction errors, would have any deficit when assessed in this way. Due to this, it's not clear if the mice have adapted to solve the task without evaluating action outcomes at all and are just acting in a more deterministic lose switch manner that would not presumably involve any of the circuitry in evaluating action outcomes.

      - The authors conclude that they do not see any evidence for bidirectional prediction errors. It is not possible to conclude this. First, they see a large response in the EP sst+ neurons to the omission of an expected reward. This is what would be expected of a negative reward prediction error. There are much more specific well controlled tests for this that are commonplace in head-fixed and freely moving paradigms that could be tested to probe this. The authors do look at the effect of previous trials on the response and do not see strong consistent results, but this is not a strong formal test of what would be expected of a prediction error, either a positive or negative. The other way they assess this is by looking at the size of the responses in different recording sessions with different reward contingencies. They claim that the size of the reward expectation and prediction error should scale with the different reward probabilities. If all the reward probabilities were present in the same session this should be true as lots of others have shown for RPE. Because however this data was taken from different sessions it is not expected that the responses should scale, this is because reward prediction errors have been shown to adaptively scale to cover the range of values on offer (Tobler et al., Science 2005). A better test of positive prediction error would be to give a larger than expected reward on a subset of trials. Either way there is already evidence that responses reflect a negative prediction error in their data and more specific tests would be needed to formally rule in or out prediction error coding especially as previous recordings have shown it is present in previous primate and rodent recordings.

      - There are a lot of variables in the GLM that occur extremely close in time such as the entry and exit of a port. If two variables occur closely in time and are always correlated it will be difficult if not impossible for a regression model to assign weights accurately to each event. This is not a large issue, but it is misleading to have regression kernels for port entry and exits unless the authors can show these are separable due to behavioral jitter or a lack of correlation under specific conditions, which does not seem to be the case.

    4. Reviewer #3 (Public review):

      Summary:

      The authors find that Sst-EPN neurons, which project to the lateral habenula, encode information about response directionality (left vs right) and outcome (rewarded vs unrewarded). Surprisingly, chronic impairment of vesicular signaling in these neurons onto their LHb targets did not impair probabilistic choice behavior.

      Strengths:

      Strengths of the current work include extremely detailed and thorough analysis of data at all levels, not only of the physiological data, but also an uncommonly thorough analysis of behavioral response patterns.

      Weaknesses:

      In this revised manuscript, the authors have addressed my earlier critiques.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this series of studies, Locantore et al. investigated the role of SST-expressing neurons in the entopeduncular nucleus (EPNSst+) in probabilistic switching tasks, a paradigm that requires continued learning to guide future actions. In prior work, this group had demonstrated EPNSst+ neurons co-release both glutamate and GABA and project to the lateral habenula (LHb), and LHb activity is also necessary for outcome evaluation necessary for performance in probabilistic decision-making tasks. Previous slice physiology works have shown that the balance of glutamate/GABA co-release is plastic, altering the net effect of EPN on downstream brain areas and neural circuit function. The authors used a combination of in vivo calcium monitoring with fiber photometry and computational modeling to demonstrate that EPNSst+ neural activity represents movement, choice direction, and reward outcomes in their behavioral task. However, viral-genetic manipulations to synaptically silence these neurons or selectively eliminate glutamate release had no effect on behavioral performance in well-trained animals. The authors conclude that despite their representation of task variables, EPN Sst+ neuron synaptic output is dispensable for task performance.

      Strengths and Weaknesses:

      Overall, the manuscript is exceptionally scholarly, with a clear articulation of the scientific question and a discussion of the findings and their limitations. The analyses and interpretations are careful and rigorous. This review appreciates the thorough explanation of the behavioral modeling and GLM for deconvolving the photometry signal around behavioral events, and the transparency and thoroughness of the analyses in the supplemental figures. This extra care has the result of increasing the accessibility for non-experts, and bolsters confidence in the results.

      (1) To bolster a reader's understanding of results, we suggest it would be interesting to see the same mouse represented across panels (i.e. Figures 1 F-J, Supplementary Figures 1 F, K, etc i.e via the inclusion of faint hash lines connecting individual data points across variables.

      Thank you for the suggestion. The same mouse is now represented in Fig 1 and Fig 1—Figure Supplement 1 as a darkened circle so it can be followed across different panels. Photometry from this mouse was used as sample date in Figure 2b and Figure 2—figure supplement 1a-b.

      (2) Additionally, Figure 3E demonstrates that eliminating the 'reward' and 'choice and reward' terms from the GLM significantly worsens model performance; to demonstrate the magnitude of this effect, it would be interesting to include a reconstruction of the photometry signal after holding out of both or one of these terms, alongside the 'original' and 'reconstructed' photometry traces in panel D. This would help give context for how the model performance degrades by exclusion of those key terms.

      We have now added analyses and reconstructed photometry signals from GLMs excluding important predictors in Figure 3—figure supplement 1 and 2. We use the model where both “Direction and reward” were omitted as predictors for the GLM and showed photometry reconstructions aligned to behavioral events used for the full model (Figure 3—figure supplement 1) and partial model (Figure 3—figure supplement 2) to compare model performance.  

      (3) Finally, the authors claimed calcium activity increased following ipsilateral movements. However, Figure 3C clearly shows that both SXcontra and SXipsi increase beta coefficients. Instead, the choice direction may be represented in these neurons, given that beta coefficients increase following CXipsi and before SEipsi, presumably when animals make executive decisions. Could the authors clarify their interpretation on this point?

      We observe that calcium activity increases during ipsilateral choices as the animal moves toward the ipsilateral side port (e.g. CX<sub>ipsi</sub> to SE<sub>ipsi</sub>; Fig 2C and Fig 3C). The animal also makes other ipsiversive movements not during the “choice” phase of a trial such as when it is returning to the center port following a contralateral choice (e.g. SX<sub>Contra</sub> to CE; Fig 2—figure supplement 1F and Fig 3C). We also observe an increase in calcium activity during these ipsiversive movements (e.g. SX<sub>Contra</sub> to CE), but they are not as large as those observed during the choice phase (Fig 2—figure supplement 1G). Therefore, during the choice phase of a trial, activity contains signals related to ipsilateral movement and additional factors (e.g. executive decision making).    

      (4) Also, it is not clear if there is a photometry response related to motor parameters (i.e. head direction or locomotion, licking), which could change the interpretation of the reward outcome if it is related to a motor response; could the authors show photometry signal from representative 'high licking' or 'low licking' reward trials, or from spontaneous periods of high vs. low locomotor speeds (if the sessions are recorded) to otherwise clarify this point?

      Unfortunately, neither licks nor locomotion were recorded during the behavioral sessions when photometry was recorded. In Figure 2—figure supplement 1a we now show individual trials sorted by trial duration (time elapsed between CE and SE) to illustrate the dynamics of the photometry signal on fast vs slow trials within a session.  

      (5) There are a few limitations with the design and timing of the synaptic manipulations that would improve the manuscript if discussed or clarified. The authors take care to validate the intersectional genetic strategies: Tetanus Toxin virus (which eliminates synaptic vesicle fusion) or CRISPR editing of Slc17a6, which prevents glutamate loading into synaptic vesicles. The magnitude of effect in the slice physiology results is striking. However, this relies on the co-infection of a second AAV to express channelrhodopsin for the purposes of validation, and it is surely the case that there will not be 100% overlap between the proportion of cells infected.

      For the Tet-tox experiments in Figure 4 we estimate approximately 70±15% of EP<sup>Sst+</sup> neurons expressed Tet-tox based on our histological counts and published stereological counts in EP (Miyamoto and Fukuda, 2015). It is true that channelrhodopsin expression will not overlap 100% with cells infected by the other virus, indeed our in vitro synaptic physiology shows small residual postsynaptic currents following optogenetic stimulation either from incomplete blockade of synaptic release or neurons that expressed channelrhodopsin but not Tettx (Figure 4—figure supplement 1J-K). The same is shown for CRISPR mediated deletion of Slc17a6 (Fig 5 – Fig supplement 1J-K).  

      (6) Alternative means of glutamate packaging (other VGluT isoforms, other transporters, etc) could also compensate for the partial absence of VGluT2, which should be discussed.

      While single cell sequencing (Wallace et al, 2017) has shown EP<sup>Sst+</sup> neurons do not express Slc17a7/8 (vGlut1 or vGlut3) it is possible that these genes could be upregulated following CRISPR mediated deletion of Slc17a6, however we do not see evidence of this with our in vitro synaptic physiology (EPSCs are significant suppressed, Figure 5 – Fig supplement 1J-K) and therefore can conclude it is highly unlikely to occur to a significant degree in our experiments. This is now included in the Discussion.

      (7) The authors do not perform a complimentary experiment to delete GABA release (i.e. via VGAT editing), which is understandable, given the absence of an effect with the pan-synaptic manipulation. A more significant concern is the timing of these manipulations as the authors acknowledge. The manipulations are all done in well-trained animals, who continue to perform during the length of viral expression. Moreover, after carefully showing that mice use different strategies on the 70/30 version vs the 90/10 version of the task, only performance on the 90/10 version is assessed after the manipulation. Together, the observation that EPNsst activity does not alter performance on a well-learned, 90/10 switching task decreases the impact of the findings, as this population may play a larger role during task acquisition or under more dynamic task conditions. Additional experiments could be done to strengthen the current evidence, although the limitation is transparently discussed by the authors.

      As mentioned above, it is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      (8) Finally, intersectional strategies target LHb-projecting neurons, although in the original characterization, it is not entirely clear that the LHb is the only projection target of EPNsst neurons. A projection map would help clarify this point.

      In a previous study we confirmed that EP<sup>Sst+</sup> neurons project exclusively to the LHb using cell-type specific rabies infection and examining all reported downstream regions for axon collaterals (Wallace et al 2017, Suppl. Fig 6F-G). When EP<sup>Sst+</sup> neurons were labeled we did not observe axon collaterals in known targets of EP such as ventro-antero lateral thalamus, red nucleus, parafasicular nucleus of the thalamus, or the pedunculopontine tegmental nucleus, only in the LHb. Additionally, using single cell tracing techniques, others have shown EP neurons that exclusively project to the LHb (Parent et al, 2001).

      Overall, the authors used a pertinent experimental paradigm and common cell-specific approaches to address a major gap in the field, which is the functional role of glutamate/GABA co-release from the major basal ganglia output nucleus in action selection and evaluation. The study is carefully conducted, their analyses are thorough, and the data are often convincing and thought-provoking. However, the limitations of their synaptic manipulations with respect to the behavioral assays reduce generalizability and to some extent the impact of their findings.

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to determine the role EP sst+ neurons play in a probabilistic switching task.

      Strengths:

      The in vivo recording of the EP sst+ neuron activity in the task is one of the strongest parts of this paper. Previous work had recorded from the EP-LHb population in rodents and primates in head-fixed configurations, the recordings of this population in a freely moving context is a valuable addition to these studies and has highlighted more clearly that these neurons respond both at the time of choice and outcome.

      The use of a refined intersectional technique to record specifically the EP sst+ neurons is also an important strength of the paper. This is because previous work has shown that there are two genetically different types of glutamatergic EP neurons that project to the LHb. Previous work had not distinguished between these types in their recordings so the current results showing that the bidirectional value signaling is present in the EP sst+ population is valuable.

      Weaknesses:

      (1) One of the main weaknesses of the paper is to do with how the effect of the EP sst+ neurons on the behavior was assessed.

      (a) All the manipulations (blocking synaptic release and blocking glutamatergic transmission) are chronic and more importantly the mice are given weeks of training after the manipulation before the behavioral effect is assessed. This means that as the authors point out in their discussion the mice will have time to adjust to the behavioral manipulation and compensate for the manipulations. The results do show that mice can adapt to these chronic manipulations and that the EP sst+ are not required to perform the task. What is unclear is whether the mice have compensated for the loss of EP sst+ neurons and whether they play a role in the task under normal conditions. Acute manipulations or chronic manipulations without additional training would be needed to assess this.

      Unfortunately, when mice are given a three week break from behavioral training (the time required to allow for adequate viral expression) behavioral performance on the task (p(highport), p(switch), trial number, trial time, etc.) is significantly degraded. Animals do eventually recover to previous performance levels, but this takes place during a 4-5 day “relearning” period. Here we sought to examine if EP<sup>Sst+</sup> neurons are required for continued task performance and chose to continue to train the animals following viral injection to avoid the “relearning” period that occurs following an extended break from behavioral training which may have made it difficult to interpret changes in behavioral performance due to the viral manipulation vs relearning.  

      Acute manipulations were not used because we planned to compare complete synaptic ablation (Tettx) and single neurotransmitter ablation (CRISPR Slc17a6) over similar time courses and we know of no acute manipulation that could achieve single neurotransmitter ablation. 

      (b) Another weakness is that the effect of the manipulations was assessed in the 90/10 contingency version of the task. Under these contingencies, mice integrate past outcomes over fewer trials to determine their choice and animals act closer to a simple win-stay-lose switch strategy. Due to this, it is unclear if the EP sst+ neurons would play a role in the task when they must integrate over a larger number of conditions in the less deterministic 70/30 version of the task.

      It is possible that a requirement for EP<sup>Sst+</sup> neurons could be revealed if the experiment was conducted with different parameters (either different reward probabilities, fluctuating reward probabilities within a session, or withholding additional training during viral expression). It is difficult to predict which version of the task, if any, would be most likely to reveal a requirement for EP<sup>Sst+</sup> neurons based on our results. We favor testing for EP<sup>Sst+</sup> function using a new behavioral paradigm that allows us to carefully examine task learning following EP manipulations in an independent study.

      The authors show an intriguing result that the EP sst+ neurons are excited when mice make an ipsilateral movement in the task either toward or away from the center port. This is referred to as a choice response, but it could be a movement response or related to the predicted value of a specific action. Recordings while mice perform movement outside the task or well-controlled value manipulations within the session would be needed to really refine what these responses are related to.

      If activity of EP<sup>Sst+</sup> neurons included a predicted value component, we would expect to see a change in activity during ipsilateral movements when the previous trial was rewarded vs unrewarded. This is examined in Fig 2—figure suppl. 2C, where we compare EP<sup>Sst+</sup> responses during ipsilateral trials when the previous trials were either rewarded (blue) or unrewarded (gray). We show that EP<sup>Sst+</sup> activity prior to side port entry (SE) is identical in these two trial types indicating that EP<sup>Sst+</sup> neurons do not show evidence of predicted value of an action in this context. Therefore, we conclude that increased EP<sup>Sst+</sup> activity during ipsilateral trials is primarily related to ipsilateral movement following CX (we call this the “choice” phase of the trial). We also show that other ipsiversive movements outside of the “choice” phase of a trial (such as the return to center port following a contralateral trial) show a smaller but significant increase in activity (Figure 2—figure supplement 1F-G). Therefore, whereas the activity observed during ipsilateral choice contains signals related to ipsilateral movement and additional factors, our data suggest that predicted value is not one of those factors. We will clarify this point and our definition of “choice” in the narrative.  

      (2) The authors conclude that they do not see any evidence for bidirectional prediction errors. It is not possible to conclude this. First, they see a large response in the EP sst+ neurons to the omission of an expected reward. This is what would be expected of a negative reward prediction error. There are much more specific well-controlled tests for this that are commonplace in head-fixed and freely moving paradigms that could be tested to probe this. The authors do look at the effect of previous trials on the response and do not see strong consistent results, but this is not a strong formal test of what would be expected of a prediction error, either a positive or negative. The other way they assess this is by looking at the size of the responses in different recording sessions with different reward contingencies. They claim that the size of the reward expectation and prediction error should scale with the different reward probabilities. If all the reward probabilities were present in the same session this should be true as lots of others have shown for RPE. Because however this data was taken from different sessions it is not expected that the responses should scale, this is because reward prediction errors have been shown to adaptively scale to cover the range of values on offer (Tobler et al., Science 2005). A better test of positive prediction error would be to give a larger-than-expected reward on a subset of trials. Either way, there is already evidence that responses reflect a negative prediction error in their data and more specific tests would be needed to formally rule in or out prediction error coding especially as previous recordings have shown it is present in previous primate and rodent recordings.

      We do not conclude that we see no evidence for RPE and the reviewer is correct in stating that a large increase in EP<sup>Sst+</sup> activity following omission of an expected reward would be expected of a negative reward prediction error. However, this observation alone is not strong enough evidence that EP<sup>Sst+</sup> neurons signal RPE. When we looked for additional evidence of RPE within our experiments we did not find consistent demonstrations of its existence in our data. When performing photometry measurements of dopamine release in the striatum, RPE signals are readily observed with a task identical to ours using trial history to as a modifier of reward prediction (Chantranupong, et al 2023). Of course, there could be a weaker more heterogeneous RPE signal in EP<sup>Sst+</sup> neurons that we cannot detect with our methods. As we state in the discussion, RPE signals may be present in a subset of individual neurons (as observed in Stephenson-Jones et al, 2016 and Hong and Hikosaka, 2008) which are below our detection threshold using fiber photometry. Additionally, Hong and Hikosaka, 2008 show that LHb-projecting GPi neurons show both positive and negative reward modulations which may obscure observation of RPE signals with photometry recordings that arise from population activity of genetically defined neurons.   

      (3) There are a lot of variables in the GLM that occur extremely close in time such as the entry and exit of a port. If two variables occur closely in time and are always correlated it will be difficult if not impossible for a regression model to assign weights accurately to each event. This is not a large issue, but it is misleading to have regression kernels for port entry and exits unless the authors can show these are separable due to behavioral jitter or a lack of correlation under specific conditions, which does not seem to be the case.

      It is true that two variables that are always correlated are redundant in a GLM. For example, center entry (CE) and center exit (CX) occur in quick succession in most trials and are highly correlated (Figure 1C). For this reason, when only one is removed as a predictor from the model but not the other there is a very small change in the MSE of the fit (Figure 3E, -CE or -CX). However, when both are removed model performance decreases further indicating that center-port nose-pokes do contribute to model performance (Figure 3E, -CE/CX). Due to the presence/absence of reward following side port entry there is substantial behavioral jitter (due to water consumption in rewarded trials) that the SE and SX are not always correlated, therefore the model performs worse when either are omitted alone, but even worse still when both SE/SX are omitted together (Figure 3E, -SE/SX). We will update Figure 3 and the narrative to make this more explicit.

      Reviewer #3 (Public Review):

      Summary:

      The authors find that Sst-EPN neurons, which project to the lateral habenula, encode information about response directionality (left vs right) and outcome (rewarded vs unrewarded). Surprisingly, impairment of vesicular signaling in these neurons onto their LHb targets did not impair probabilistic choice behavior.

      Strengths:

      Strengths of the current work include extremely detailed and thorough analysis of data at all levels, not only of the physiological data but also an uncommonly thorough analysis of behavioral response patterns.

      Weaknesses:

      Overall, I saw very few weaknesses, with only two issues, both of which should be possible to address without new experiments:

      (1) The authors note that the neural response difference between rewarded and unrewarded trials is not an RPE, as it is not affected by reward probability. However, the authors also show the neural difference is partly driven by the rapid motoric withdrawal from the port. Since there is also a response component that remains different apart from this motoric difference (Figure 2, Supplementary Figure 1E), it seems this is what needs to be analyzed with respect to reward probability, to truly determine whether there is no RPE component. Was this done?

      We thank the reviewer for this comment, we believe this is particularly important for unrewarded trials as SE and SX occur in rapid succession. In Figure 2—figure supplement 2A-B we now show the photometry signal from Rewarded and Unrewarded ipsilateral trials aligned to SX for different reward probabilities. We quantify the signals for different reward probabilities during a 500ms window immediately prior to SX but find no differences between groups.  

      (2) The current study reaches very different conclusions than a 2016 study by Stephenson-Jones and colleagues despite using a similar behavioral task to study the same Sst-EPN-LHb circuit. This is potentially very interesting, and the new findings likely shed important light on how this circuit really works. Hence, I would have liked to hear more of the authors' thoughts about possible explanations of the differences. I acknowledge that a full answer might not be possible, but in-depth elaboration would help the reader put the current findings in the context of the earlier work, and give a better sense of what work still needs to be done in the future to fully understand this circuit.

      For example, the authors suggest that the Sst-EPN-LHb circuit might be involved in initial learning, but play less of a role in well-trained animals, thereby explaining the lack of observed behavioral effect. However, it is my understanding that the probabilistic switching task forces animals to continually update learned contingencies, rendering this explanation somewhat less persuasive, at least not without further elaboration (e.g. maybe the authors think it plays a role before the animals learn to switch?).

      Also, as I understand it, the 2016 study used manipulations that likely impaired phasic activity patterns, e.g. precisely timed optogenetic activation/inhibition, and/or deletion of GABA/glutamate receptors. In contrast, the current study's manipulations - blockade of vesicle release using tetanus toxin or deletion of VGlut2, would likely have blocked both phasic and tonic activity patterns. Do the authors think this factor, or any others they are aware of, could be relevant?

      We have added further discussion of the Stephenson-Jones, et al 2016 study as well as the Lazaridis, et al 2019 study which shows no effect of phasic stimulation of EP when specifically manipulating EP<sup>Sst+</sup> (vGat+/vGlut2+) neurons rather than vGlut2+ neurons as in the Stephenson-Jones study.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In some places, there seems to be a mismatch between referenced figures and texts. For example:

      (1) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (2) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 - Figure Supplement 1d). I presume that they mean Figure 2 - Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Minor suggestions related to specific figure presentation are below:

      Figure 2 and supplement figures:

      (1) Figure 2B: the authors may consider presenting outcome-related signals recorded from all trials, including both ipsilateral and contralateral events, and align signals to SE when reward consumption presumably begins, rather than aligning to CE.

      We have added sample recordings from ipsilateral and contralateral trials and sorted them by trial duration to allow for clearer presentation of activity following CE and SE (Figure 2—figure supplement 1a-b).

      (2) The authors described that 'This increase in activity was seen for all three reward probabilities tested (90/10, 80/20, and 70/30) and occurred while the animal was engaged in ipsiversive movements as similar increases were observed following side exit (SX) on contralateral trials as the animal was moving from the contralateral side port back to the center port (Figure 2-Figure Supplement 1c)', but supplement 1c is not about calcium dynamics around the SX event. I presume they mean Figure 2-Figure Supplement 1d.

      Yes, this will be corrected in the revised manuscript.

      (3) The authors explained that increased EPSst+ neuronal activity following an unrewarded outcome was partially due to the rapid withdrawal of the animal's snout following an unrewarded outcome however, differences in rewarded and unrewarded trials were still distinguishable when signals were aligned to side port exit indicating that these increases in EPSst+ neuronal activity on unrewarded trials were a combination of outcome evaluation (unrewarded) and side port withdrawal occurring in quick succession (SX, Figure 2 -Figure Supplement 1d). I presume that they mean Figure 2 -Figure Supplement 1e.

      Yes, this will be corrected in the revised manuscript.

      Figure 3 and supplement figures:

      (1) Figure 3C-F: it is hard to compare the amplitude of calcium signals between different behaviour events without a uniform y-axis.

      The scale for the y-axis on Figure 3C-D is uniform for all panels. Figure 3E is also uniform for all boxplots. The reviewer may be referring to Figure 2C-F, but the y-axis for all of the photometry data is uniform for all panels and the horizontal line represents zero. The y-axis for the quantification on the right of each panel is scaled to the max/min for each comparison.

      (2) Figure 3E is difficult to follow. The authors explained that the 'SE' variable is generated by collapsing the ipsilateral and contralateral port entries, and hence the variable has no choice of direction information. I assumed that the 'SX', 'CE', and 'CX' variables are generated similarly. It is not clear if this is the case for the 'side', 'centre' and 'choice' variables. The authors explained that 'omitting center port entry/exit together or individually also resulted in decreased GLM performance but to a smaller degree than the omission of choice direction (Figure 3e, "-Center")'. My understanding is that they created the Centre variable by collapsing ipsilateral and contralateral centre port entry/exit together. The Centre variable should have no choice of direction information. How is the Center variable generated differently from omitting centre port entry/exit together? I would ask the authors to explain the model and different variables a bit more thoroughly in the text.

      We apologize for the confusion. All ten variables used to train the full GLM are listed in Fig. 3C. In Figure 3E variable(s) were omitted to test how they contributed to GLM performance (data labeled “None” is the full model with all variables). Omitted variables are now defined as follows: -Rew = Rew+Unrew removed, -Direction = Ipsi/Contra designation removed and collapsed into CE, CX, SE, SX, -Direction & Rew = Ipsi/Contra info removed from all variables + Rew/Unrew removed, -CE/CX = Ipsi/Contra CE and CX removed, -CE = Ipsi/contra CE removed, -CX = Ipsi/contra CX removed, -SE/SX = Ipsi/Contra SE and SX removed, -SE = Ipsi/contra SE removed, -SX = Ipsi/contra SX removed. This clarification has also been added to the Generalized Linear Model section of Materials and Methods.

      Figure 5 and supplement figures:

      There are no representative and summary figures show the specificity and efficiency of oChief-tdTomato or Tetx-GFP expression. Body weight changes following virus injection are not well described.

      A representative image of Tettx GFP expression are shown in Fig. 4A and percent of infected EP<sup>Sst+</sup> neurons is described in the text (70±15.1% (mean±SD), 1070±230 neurons/animal, n=6 mice). Most oChief-tdTom animals were used for post-hoc electrophysiology experiments and careful quantification of viral expression was not possible. However, Slc17a6 deletion was confirmed in these animals (Fig. 5 – Fig supplement 1J-K) to confirm the manipulation was effective in the experimental group. A representative image of oChief-tdTom expression is shown in Fig. 5A.

      We now mention the body weight changes observed following Tettx injection in the narrative.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the RFLR section you state that "this variable decays...", a variable can't decay only the value of a variable can change. Also, it is not mentioned what variable is being discussed. There are lots of variables in the model so this should be made clear.

      We now state, “This variable (β) changes over trials and is updated with new evidence from each new trial’s choice and outcome with an additional bias towards or away from its most recent choice (Figure 1-figure supplement 2A-C).”

      (2) I couldn't find in the results section, or the methods section the details for the Tet tx experiments, were mice trained and tested on 90/10 only? Were they trained while the virus was expressing etc? This should be added.

      In the methods section we state, ”For experiments where we manipulated synaptic release in EP<sup>Sst+</sup> neurons (Figures 4-5) we trained mice (reward probabilities 90/10, no transparent barrier present) to the following criteria for the 5 days prior to virus injection: 1) p(highport) per session was greater than or equal to 0.80 with a variance less than 0.003, 2) p(switch) per session was less than or equal to 0.15 with a variance less than 0.001, 3) the p(left port) was between 0.45-0.55 with a variance less than 0.005, and 4) the animal performed at least 200 trials in a session. The mean and variance for these measurements was calculated across the five session immediately preceding surgery. The criterion were determined by comparing performance profiles in separate animals and chosen based on when animals first showed stable and plateaued behavioral performance. Following surgery, mice were allowed to recover for 3 days and then continued to train for 3 weeks during viral expression. Data collected during the 5 day pre-surgery period was then compared to data collected for 10 sessions following the 3 weeks allotted for viral expression (i.e. days 22-31 post-surgery).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The kernel in Figure 3C shows an activation prior to CE on "contra" trials that is not apparent in Figure 2C which shows no activation prior to CE on either contra or ipsi trials. Given that movement directionality prior to CE is dictated by the choice on the PREVIOUS trial, is the "contra" condition in 3C actually based on the previous trial? If so, this should be clarified.

      On most “contra” trials the animal is making an ipsiversive movement just prior to CE as it returns to the center from the contralateral side-port (as most trials are no “switch” trials). Therefore, an increase in activity is expected and shown most clearly following SX for contralateral trials in Fig 2 –Fig suppl 1F. A significant increase in activity prior to CE on contra trials compared to ipsi trials can also be seen in Fig 2C, its just not as large a change as the increase observed following CE for ipsi. trials. The comparison between activity observed during the two types of ipsiversive movements is now shown directly in Figure 2—figure supplement 1G.

      (2) Paragraph 7 of the discussion uses a phrase "by-in-large", which probably should be "by and large".

      Thank you for the correction.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      Readers would also benefit from coding individual data points by sex and noting N/sex.

      Sex breakdown has been added to figure legends for each experiment, full statistical reporting is now also include in the figure legends.

    1. eLife Assessment

      Cav2 voltage-gated calcium channels play key roles in regulating synaptic strength and plasticity. In contrast to mammals, invertebrates like Drosophila encode a single Cav2 channel, raising questions on how diversity in Cav2 is achieved from a single gene. Here, the authors present solid evidence that two alternatively spliced Cac isoforms enable important changes in Cav2 expression, localization, and function in synaptic transmission and plasticity at the Drosophila neuromuscular junction. How the isoforms affect synaptic calcium channel levels remains less clear. This study provides insights into the roles of voltage-gated calcium channel splice isoforms in synaptic transmission.

    2. Reviewer #2 (Public review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones. Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There remain some questions regarding the mechanisms underlying the changes in Cac localization to somatodendritic compartments. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

    3. Reviewer #3 (Public review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      Comments on revisions:

      The authors addressed most points. However, from my point of view, the new data (somatodendritic cac currents in adult motoneurons of IS4B mutants without the pre-pulse, and localization of IS4A channels in the larval brain) do not strongly support that the IS4B exon is required for cacophony localization. According to their definition of localization, IS4B is required for cacophony channels to enter motoneuron boutons and to localize to active zones. In case of a true localization defect (without degradation, as they claim), IS4A channels should mislocalize to the soma, axon, or dendrite. However, they do not find them in motoneurons of IS4B mutants. Furthermore, I find the interpretation of the voltage clamp data in flight motoneurons rather difficult. On the one hand, sustained HVA cac currents are strongly attenuated/absent in IS4B mutants. On the other hand, total cac currents (without the -50 mV pre-pulse, not shown in the original submission) are less affected in IS4B mutants. Based on these data, they conclude that IS4B is required for sustained HVA cac currents and that IS4A channel isoforms are expressed and functional. How does this relate to a localization defect at the NMJ? Finally, detecting IS4A channels in other cell types and regions is not a strong argument for a localization defect at the NMJ. I, therefore, suggest toning down the conclusions regarding a localization defect in IS4B mutants/a role for the IS4B exon in cac localization. It should be also discussed how a splice isoform in S4 may result in no detectable cac channels at the NMJ or regulate subcellular channel localization.

      I have a few additional points, mainly related to the responses to my previous points:

      (1) The authors state "active zones at the NMJ contain only cac isoforms with the IS4B exon. Nevertheless, the small representative EPSC remaining in IS4B mutants suggests that there is synchronous release in the absence of IS4B (Fig. 3B). Are the small EPSCs in dIS4B (Fig. 3B) indeed different from noise/indicative of evoked release? If yes, which cac channels may drive these EPSCs? IS4A channels?<br /> (2) (Related to previous point 4) The authors argue that EPSC amplitudes are not statistically different between Canton S and IS4A mutants (Fig. 2F). However, the Canton S group appears undersampled, thus precluding conclusions based on statistics. Moreover, the effect size Canton S vs. dIS4A looks similar to the one of Canton S vs. dIS4A/dIS4B.<br /> (3) (Related to previous point 11): Can they cite a paper relating calcium channel inactivation to EPSC half width/decay kinetics to support their speculation that "decreased EPSC half width could be caused by significantly faster channel inactivation kinetics" (p. 42, l.42). In addition, many papers have demonstrated that mini decay kinetics provide valuable insights into GluR subunit composition at the Drosophila NMJ (e.g., Schmid et al., 2008 https://doi.org/10.1038/nn.2122). Thus, the statement "Mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells" is not valid and should be revised.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Bell et. al. describes an analysis of the effects of removing one of two mutually exclusive splice exons at two distinct sites in the Drosophila CaV2 calcium channel Cacophony (Cac). The authors perform imaging and electrophysiology, along with some behavioral analysis of larval locomotion, to determine whether these alternatively spliced variants have the potential to diversify Cac function in presynaptic output at larval neuromuscular junctions. The author provided valuable insights into how alternative splicing at two sites in the calcium channel alters its function.

      Strengths:

      The authors find that both of the second alternatively spliced exons (I-IIA and I-IIB) that are found in the intracellular loop between the 1st and second set of transmembrane domains can support Cac function. However, loss of the I-IIB isoform (predicted to alter potential beta subunit interactions) results in 50% fewer channels at active zones and a decrease in neurotransmitter release and the ability to support presynaptic homeostatic potentiation. Overall, the study provides new insights into Cac diversity at two alternatively spliced sites within the protein, adding to our understanding of how regulation of presynaptic calcium channel function can be regulated by splicing.

      Weaknesses:

      The authors find that one splice isoform (IS4B) in the first S4 voltage sensor is essential for the protein's function in promoting neurotransmitter release, while the other isoform (IS4A) is dispensable. The authors conclude that IS4B is required to localize Cac channels to active zones. However, I find it more likely that IS4B is required for channel stability and leads to the protein being degraded, rather than any effect on active zone localization. More analysis would be required to establish that as the mechanism for the unique requirement for IS4B.

      (1) We thank the reviewer for this important point. In fact, all three reviewers raised the same question, and the reviewing editor pointed out that caution or additional experiments were required to distinguish between IS4 splicing being important for cac channel localization versus channel stability/degradation. We provide multiple sets of experiments as well as text and figure revisions to strengthen our claim that the IS4B exon is required for cacophony channels to enter motoneuron presynaptic boutons and localize to active zones.

      a. If IS4B was indeed required for cac channel stability (and not for localization to active zones) IS4A channels should be instable wherever they are. This is not the case because we have recorded somatodendritic cacophony currents from IS4A expressing adult motoneurons that were devoid of cac channels with the IS4B exon. Therefore, IS4A cac channels are not instable but underlie somatodendritic voltage dependent calcium currents in these motoneurons. These new data are now shown in the revised figure 3C and referred to in the text on page 7, line 42 to page 8 line 9.

      b. Similarly, if IS4B was required for channel stability, it should not be present anywhere in the nervous system. We tested this by immunohistochemistry for GFP tagged IS4A channels in the larval CNS. Although IS4A channels are sparsely expressed, which is consistent with low expression levels seen in the Western blots (Fig. 1E), there are always defined and reproducible patterns of IS4A label in the larval brain lobes as well as in the anterior part of the VNC. This again shows that the absence of IS4A from presynaptic active zones is not caused by channel instability, because the channel is expressed in other parts of the nervous system. These data are shown in the new supplementary figure 1 and referred to in the text on page 15, lines 3 to 8.

      c. As suggested in a similar context by reviewers 1 and 2, we now show enlargements of the presence of IS4B channels in presynaptic active zones as well as enlargements of the absence of IS4A channels in presynaptic active zones in the revised figures 2A-C and 3A. In these images, no IS4A label is detectable in active zones or anywhere else throughout the axon terminals, thus indicating that IS4B is required for expressing cac channels in the axon terminal boutons and localizing it to active zones. Text and figure legends have been adjusted accordingly.

      d. Related to this, reviewer 1 also recommended to quantify the IS4A and ISB4 channel intensity and co-localization with the active zone marker brp (recommendation for authors). After following the reviewers’ suggestion to adjust the background values in IS4A and IS4B immunolabels to identical (revised Figs. 2A-C), it becomes obvious that IS4A channel are not detectable above background in presynaptic terminals or active zones, thus intensity is close to zero. We still calculated the Pearsons co-localization coefficient for both IS4 variants with the active zone marker brp. For IS4B channels the Pearson’s correlation coefficient is control like, just above 0.6, whereas for IS4A channels we do not find colocalization with brp (Pearson’s below 0.25). These new analyses are now shown in the revised figure 2D and referred to on page 6, lines 33 to 38.

      e. Consistent with our finding that IS4B is required for cac channel localization to presynaptic active zones, upon removal of IS4B we find no evoked synaptic transmission (Fig. 2 in initial submission, now Fig. 3B).

      Together these data are in line with a unique requirement of IS4B at presynaptic active zones (not excluding additional functions of IS4B), whereas IS4A containing cac isoforms are not found in presynaptic active zones and mediate different functions.

      Reviewer #2 (Public Review):

      This study by Bell et al. focuses on understanding the roles of two alternatively spliced exons in the single Drosophila Cav2 gene cac. The authors generate a series of cac alleles in which one or the other mutually exclusive exons are deleted to determine the functional consequences at the neuromuscular junction. They find alternative splicing at one exon encoding part of the voltage sensor impacts the activation voltage as well as localization to the active zone. In contrast, splicing at the second exon pair does not impact Cav2 channel localization, but it appears to determine the abundance of the channel at active zones.

      Together, the authors propose that alternative splicing at the Cac locus enables diversity in Cav2 function generated through isoform diversity generated at the single Cav2 alpha subunit gene encoded in Drosophila.

      Overall this is an excellent, rigorously validated study that defines unanticipated functions for alternative splicing in Cav2 channels. The authors have generated an important toolkit of mutually exclusive Cac splice isoforms that will be of broad utility for the field, and show convincing evidence for distinct consequences of alternative splicing of this single Cav2 channel at synapses. Importantly, the authors use electrophysiology and quantitative live sptPALM imaging to determine the impacts of Cac alternative splicing on synaptic function. There are some outstanding questions regarding the mechanisms underlying the changes in Cac localization and function, and some additional suggestions are listed below for the authors to consider in strengthening this study. Nonetheless, this is a compelling investigation of alternative splicing in Cav2 channels that should be of interest to many researchers.

      (2) We believe that the additional data on cac IS4A isoform localization and function as detailed above (response to public review 1) has strengthened the manuscript and answered some of the remaining questions the reviewer refers to. We are also grateful for the specific additional reviewer suggestions which we have addressed point-by-point and refer to below (section recommendations for authors).

      Reviewer #3 (Public Review):

      Summary:

      Bell and colleagues studied how different splice isoforms of voltage-gated CaV2 calcium channels affect channel expression, localization, function, synaptic transmission, and locomotor behavior at the larval Drosophila neuromuscular junction. They reveal that one mutually exclusive exon located in the fourth transmembrane domain encoding the voltage sensor is essential for calcium channel expression, function, active zone localization, and synaptic transmission. Furthermore, a second mutually exclusive exon residing in an intracellular loop containing the binding sites for Caβ and G-protein βγ subunits promotes the expression and synaptic localization of around ~50% of CaV2 channels, thereby contributing to ~50% of synaptic transmission. This isoform enhances release probability, as evident from increased short-term depression, is vital for homeostatic potentiation of neurotransmitter release induced by glutamate receptor impairment, and promotes locomotion. The roles of the two other tested isoforms remain less clear.

      Strengths:

      The study is based on solid data that was obtained with a diverse set of approaches. Moreover, it generated valuable transgenic flies that will facilitate future research on the role of calcium channel splice isoforms in neural function.

      Weaknesses:

      (1) Based on the data shown in Figures 2A-C, and 2H, it is difficult to judge the localization of the cac isoforms. Could they analyze cac localization with regard to Brp localization (similar to Figure 3; the term "co-localization" should be avoided for confocal data), as well as cac and Brp fluorescence intensity in the different genotypes for the experiments shown in Figure 2 and 3 (Brp intensity appears lower in the dI-IIA example shown in Figure 3G)? Furthermore, heterozygous dIS4B imaging data (Figure 2C) should be quantified and compared to heterozygous cacsfGFP/+.

      According to the reviewer’s suggestion, we have quantified cac localization relative to brp localization by computing the Pearson’s correlation coefficient for controls and IS4A as well as IS4B animals. These new data are shown in the revised Fig. 2D and referred to on page 6, lines 33-38. Furthermore, we now confirm control-like Pearson’s correlation coefficients for all exon out variants except ΔIS4B and show Pearson’s correlation coefficients for all genotypes side-by-side in the revised Fig. 4D (legend has been adjusted accordingly). In addition, in response to the recommendations to authors, we now provide selective enlargements for the co-labeling of Brp and each exon out variant in the revised figures 2-4. We have also adjusted the background in Fig. 2C (ΔIS4B) to match that in Figs. 2A and B (control and ΔIS4A). This allows a fair comparison of cac intensities following excision of IS4B versus excision of IS4A and control (see also Fig 3). Together, this demonstrates the absence of IS4A label in presynaptic active zones much clearer. As suggested, we have also quantified brp puncta intensity on m6/7 across homozygous exon excision mutants and found no differences (this is now stated for IS4A/IS4B in the results text on page 6, lines 37/38 and for I-IIA/I-IIB on page 8, lines 42-44.). We did not quantify the intensity of cacophony puncta upon excision of IS4B because the label revealed no significant difference from background (which can be seen much better in the images now), but the brp intensities remained control-like even upon excision of IS4B.

      (2) They conclude that I-II splicing is not required for cac localization (p. 13). However, cac channel number is reduced in dI-IIB. Could the channels be mis-localized (e.g., in the soma/axon)? What is their definition of localization? Could cac be also mis-localized in dIS4B? Furthermore, the Western Blots indicate a prominent decrease in cac levels in dIS4B/+ and dI-IIB (Figure 1D). How do the decreased protein levels seen in both genotypes fit to a "localization" defect? Could decreased cac expression levels explain the phenotypes alone?

      We have now precisely defined what we mean by cac localization, namely the selective label of cac channels in presynaptic active zones that are defined as brp puncta, but no cac label elsewhere in the presynaptic bouton (page 6, lines 18 to 20). On the level of CLSM microscopy this corresponds to overlapping cac puncta and brp puncta, but no cac label elsewhere in the bouton. Based on the additional analysis and data sets outlined in our response 1 (see above) we conclude that excision of IS4B does not cause channel mislocalization because we find reproducible expression patterns elsewhere in the nervous system as well as somatodendritic cac current in ΔIS4B (for detail see above). Therefore, the isoforms containing the mutually exclusive IS4A exon are expressed and mediate other functions, but cannot substitute IS4B containing isoforms at the presynaptic AZ. In fact, our Western blots are in line with reduced cac expression if all isoforms that mediate evoked release are missing, again indicating that the presynapse specific cac isoforms cannot be replaced by other cac isoforms. This is also in line with the sparse expression of IS4A throughout the CNS as seen in the new supplementary figure 1 (for detail see above).

      (3) Cac-IS4B is required for Cav2 expression, active zone localization, and synaptic transmission. Similarly, loss of cac-I-IIB reduces calcium channel expression and number. Hence, the major phenotype of the tested splice isoforms is the loss of/a reduction in Cav2 channel number. What is the physiological role of these isoforms? Is the idea that channel numbers can be regulated by splicing? Is there any data from other systems relating channel number regulation to splicing (vs. transcription or post-transcriptional regulation)?

      Our data are not consistent with the idea that splicing regulates channel numbers. Rather, splicing can be used to generate channels with specific properties that match the demand at the site of expression. For the IS4 exon pair we find differences in activation voltage between IS4A and IS4B channels (revised Fig. 3C), with IS4B being required for sustained HVA current. IS4A does not localize to presynaptic active zones at the NMJ and is only sparsely expressed elsewhere in the NS (new supplementary Fig. 1). By contrast, IS4B is abundantly expressed in many neuropils. Therefore, taking out IS4B takes out the more abundant IS4 isoform. This is consistent with different expression levels for IS4 isoforms that have different functions, but we do not find evidence for splicing regulating expression levels per se.

      Similarly, the I-II mutually exclusive exon pair differs markedly in the presence or absence of G-protein βγ binding sites that play a role in acute channel regulation as well the conservation of the sequence for β-subunit binding (see page 5, lines 9-17). Channel number reduction in active zones occurs specifically if expression of the cac channels with the G<sub>βγ</sub>-binding site as well as the more conserved β-subunit binding is prohibited by excision of the I-IIB exon (see Fig. 5F). Vice versa, excision of I-IIA does not result in reduced channel numbers. This scenario is consistent with the hypothesis that conserved β-subunit binding affects channel number in the active zone (see page 17, lines 3 to 6 and lines 33-36), but we have no evidence that I-II splicing per se affects channel number.

      (4) Although not supported by statistics, and as appreciated by the authors (p. 14), there is a slight increase in PSC amplitude in dIS4A mutants (Figure 2). Similarly, PSC amplitudes appear slightly larger (Figure 3J), and cac fluorescence intensity is slightly higher (Figure 3H) in dI-IIA mutants. Furthermore, cac intensity and PSC amplitude distributions appear larger in dI-IIA mutants (Figures 3H, J), suggesting a correlation between cac levels and release. Can they exclude that IS4A and/or I-IIA negatively regulate release? I suggest increasing the sample size for Canton S to assess whether dIS4A mutant PSCs differ from controls (Figure 2E). Experiments at lower extracellular calcium may help reveal potential increases in PSC amplitude in the two genotypes (but are not required). A potential increase in PSC amplitude in either isoform would be very interesting because it would suggest that cac splicing could negatively regulate release.

      There are several possibilities to explain this, but as none of the effects is statistically significant, we prefer to not investigate this in further depth. However, given that we cannot find IS4A in presynaptic active zones (revised figures 2C and 3A plus the new enlargements 2Ci and 3Ai, revised text page 6, lines 22 to 24 and 29 to 31, and page 7, second paragraph, same as public response 1D) IS4A channels cannot have a direct negative effect on release probability. Nonetheless, given that IS4A containing cac isoforms mediate functions in other neuronal compartments (see revised Fig. 3C) it may regulate release indirectly by affecting e.g. action potential shape. Moreover, in response to the more detailed suggestions to authors we provide new data that give additional insight.

      (5) They provide compelling evidence that IS4A is required for the amplitude of somatic sustained HVA calcium currents. However, the evidence for effects on biophysical properties and activation voltage (p. 13) is less convincing. Is the phenotype confined to the sustained phase, or are other aspects of the current also affected (Figure 2J)? Could they also show the quantification of further parameters, such as CaV2 peak current density, charge density, as well as inactivation kinetics for the two genotypes? I also suggest plotting peaknormalized HVA current density and conductance (G/Gmax) as a function of Vm. Could a decrease in current density due to decreased channel expression be the only phenotype? How would changes in the sustained phase translate into altered synaptic transmission in response to AP stimulation?

      Most importantly, sustained HVA current is abolished upon excision of IS4B (not IS4A, we think the reviewer accidentally mixed up the genotype) and presynaptic active zones at the NMJ contain only cac isoforms with the IS4B exon. This indicates that the cac isoforms that mediate evoked release encode HVA channels. The somatodendritic currents shown in the revised figure 3C (previously 2J) that remain upon excision of IS4B are mediated by IS4A containing cac isoforms. Please note that these never localize to the presynaptic active zone, and thus do not contribute to evoked release. Therefore, the interpretation is that specifically sustained HVA current encoded by IS4B cac isoforms is required for synaptic transmission. Reduced cac current density due to decreased channel expression is not the cause for impaired evoked release upon IS4B excision, but instead, the cause is the absence of any cac channels in active zones. IS4B-containing cac isoforms encode sustained HVA current, and we speculate that this might be a well suited current to minimize cacophony channel inactivation in the presynaptic active zone. Given that HVA current shows fast voltage dependent activation and fast inactivation upon repolarization, it is useful at large intraburst firing frequencies as observed during crawling (Kadas et al., 2017) without excessive cac inactivation (see page 15, Kadas, lines 16 to 20).

      However, we agree with the reviewer that a deeper electrophysiological analysis of splice isoform specific cac currents will be instructive. We have now added traces of control and ΔIS4B from a holding potential of -90 mv (revised Fig. 3C, bottom traces and revised text on page 7, line 43 to page 8, lines 1 to 10), and these are also consistent with IS4B mediating sustained HVA cac current. However, further analysis of activation and inactivation voltages and kinetics suffers form space clamp issues in recordings from the somata of such complex neurons (DLM motoneurons of the adult fly contain roughly 6000 µm of dendrites with over 4000 branches, Ryglewski et al., 2017, Neuron 93(3):632-645). Therefore, we will analyze the currents in a heterologous expression system and present these data to the scientific community as a separate study at a later time point.

      (6) Why was the STED data analysis confined to the same optical section, and not to max. intensity z-projections? How many and which optical sections were considered for each active zone? What were the criteria for choosing the optical sections? Was synapse orientation considered for the nearest neighbor Cac - Brp cluster distance analysis? How do the nearest-neighbor distances compare between "planar" and "side-view" Brp puncta?

      Maximum intensity z-projections would be imprecise because they can artificially suggest close proximity of label that is close by in x and y but far away in z. Therefore, the analysis was executed in xy-direction of various planes of entire 3D image stacks. We considered active zones of different orientations (Figs. 5C, D) to account for all planes. In fact, we searched the entire z-stacks until we found active zones of all orientations within the same boutons, as shown in figures 5C1-C6. The same active zone orientations were analyzed for all exon-out mutants with cac localization in active zones. The distance between cac and brp did not change if viewed from the side or any other orientation. We now explain this in more clarity in the results text on page 9, lines 23/24.

      (7) Cac clusters localize to the Brp center (e.g., Liu et al., 2011). They conclude that Cav2 localization within Brp is not affected in the cac variants (p. 8). However, their analysis is not informative regarding a potential offset between the central cac cluster and the Brp "ring". Did they/could they analyze cac localization with regard to Brp ring center localization of planar synapses, as well as Brp-ring dimensions?

      In the top views (planar) we did not find any clear offset in cac orientation to brp between genotypes. In such planar synapses (top views, Fig. 5D, left row) we did not find any difference in Brp ring dimensions. We did not quantify brp ring dimensions rigorously, because this study focusses on cac splice isoform-specific localization and function. Possible effects of different cac isoforms on brp-ring dimensions or other aspects of scaffold structure are not central to our study, in particular given that brp puncta are clearly present even if cac is absent from the synapse (Fig. 3A), indicating that cac is not instructive for the formation of the brp scaffold.

      (8) Given the accelerated PSC decay/ decreased half width in dI-IIA (Fig. 5Q), I recommend reporting PSC charge in Figure 3, and PPR charge in Figures 5A-D. The charge-based PPRs of dI-IIA mutants likely resemble WT more closely than the amplitude-based PPR. In addition, miniature PSC decay kinetics should be reported, as they may contribute to altered decay kinetics. How could faster cac inactivation kinetics in response to single AP stimulation result in a decreased PSC half-width? Is there any evidence for an effect of calcium current inactivation on PSC kinetics? On a similar note, is there any evidence that AP waveform changes accelerate PSC kinetics? PSC decay kinetics are mainly determined by GluR decay kinetics/desensitization. The arguments supporting the role of cac splice isoforms in PSC kinetics outlined in the discussion section are not convincing and should be revised.

      We agree that reporting charge in figure 3 is informative and do so in the revised text. Since the result (no significant difference in the PSCs between between CS, cac<sup>GFP</sup>, <sup>ΔI-IIA</sup>, and transheterozygous I-IIA/I-IIB, but significantly smaller values in ΔI-IIB) remained unchanged no matter whether charge or amplitude were analyzed, we decided to leave the figure as is and report the additional analysis in the text (page 8, lines 40 to 42). This way, both types of analysis are reported. Please note that EPSC amplitude is slightly but not significantly increased upon excision of I-IIA (Fig. 4J), whereas EPSC half amplitude width is significantly smaller (Fig. 5Q, now revised Fig 6R). Together, a tendency of increased EPSC amplitudes and smaller half amplitude width result in statistically insignificant changes in EPSC in ∆I-IIA (now discussed on page 15, lines 37 to 40). We also understand the reviewer’s concern attributing altered EPSC kinetics to presynaptic cac channel properties. We have toned down our interpretation in the discussion and list possible alterations in presynaptic AP shape or cac channel kinetics as alternative explanations (not conclusions; see revised discussion on page 15, line 40 to page 16, line 2). Moreover, we have quantified postsynaptic GluRIIA abundance to test whether altered PSC kinetics are caused by altered GluRIIA expression. In our opinion, the latter is more instructive than mini decay kinetic analysis because this depends strongly on the distance of the recording electrode to the actual site of transmission in these large muscle cells. Although we find no difference in GluRIIA expression levels we now clearly state that we cannot exclude other changes in GluR receptor fields, which of course, could also explain altered PSC kinetics. We have updated the discussion on page 16, lines 2/3 accordingly.

      (9) Paired-pulse ratios (PPRs): On how many sweeps are the PPRs based? In which sequence were the intervals applied? Are PPR values based on the average of the second over the first PSC amplitudes of all sweeps, or on the PPRs of each sweep and then averaged? The latter calculation may result in spurious facilitation, and thus to the large PPRs seen in dI-IIB mutants (Kim & Alger, 2001; doi: 10.1523/JNEUROSCI.21-2409608.2001).

      We agree that the PP protocol and analyses had to be described more precisely in the methods and have done so on page 23, lines 31 to 37 in the methods. Mean PPR values are based on the PPRs of each sweep and then averaged. We are aware of the study of Kim and Alger 2001 and have re-analyzed the PP data in both ways outlined by the reviewer. We get identical results with either analyses method. Spurious facilitation is thus not an issue in our data. We now explain this in the methods section along with the PPR protocol. The large spread seen in dI-IIB is indeed caused by reduced calcium influx into active zones with fewer channels, as anticipated by the reviewer (see next point).

      (10) Could the dI-IIB phenotype be simply explained by a decrease in channel number/ release probability? To test this, I propose investigating PPRs and short-term dynamics during train stimulation at lower extracellular Ca2+ concentration in WT. The Ca2+ concentration could be titrated such that the first PSC amplitude is similar between WT and dI-IIB mutants. This experiment would test if the increased PPR/depression variability is a secondary consequence of a decrease in Ca2+ influx, or specific to the splice isoform.

      In fact, the interpretation that decreased PSC amplitude upon I-IIB excision is caused mainly by reduced channel number is precisely our interpretation (see discussion page 14, last paragraph to page 15, first paragraph in the original submission, now page 16, second paragraph paragraph). In addition, we are grateful for the reviewer’s suggestion to triturate the external calcium such that the first PSC amplitude in matches in ∆I-IIB and control. This experiment tests whether altered short term plasticity is solely a function of altered channel number or whether additional causes, such as altered channel properties, also play into this. We triturated the first pulse amplitude in ∆I-IIB to match control and find that paired pulse ratio and the variance thereof are not different anymore. Therefore, the differences observed in identical external calcium can be fully explained by altered channel numbers. This additional dataset is shown in the revised figures 6D and E and referred to in the results section on page 10, lines 14 to 25 and the discussion on page16, lines 36 to 38.

      (11) How were the depression kinetics analyzed? How many trains were used for each cell, and how do the tau values depend on the first PSC amplitude? Time constants in the range of a few (5-10) milliseconds are not informative for train stimulations with a frequency of 1 or 10 Hz (the unit is missing in Figure 5H). Also, the data shown in Figures 5E-K suggest slower time constants than 5-10 ms. Together, are the data indeed consistent with the idea that dIIIB does not only affect cac channel number, but also PPR/depression variability (p. 9)?

      For each animal the amplitudes of all subsequent PSCs in each train were plotted over time and fitted with a single exponential. For depression at 1 and 10 Hz, we used one train per animal, and 5-6 animals per genotype (as reflected in the data points in Figs. 6I, M). This is now explained in more detail in the revised methods section (page 23, lines 39 to 41). The tau values are not affected by the amplitude of the first PSC. First, we carefully re-fitted new and previously presented depression data and find that the taus for depression at low stimulation frequencies (1 and 10Hz) are not affected by exon excisions at the I-II site. We thank the reviewer for detecting our error in units and tau values in the previous figure panels 5H and L (this has now been corrected in the revised figure panels 6I and M). Given that PSC amplitude upon I-IIB excision is significantly smaller than in controls and following I-IIA excision, we suspected that the time course of depression at low stimulation frequency is not significantly affected by the amount of calcium influx during the first PSC. To further test this, we followed the reviewer ’s suggestion and re-measured depression at 1 and 10 Hz for cac-GFP controls and for delta I-IIB in a higher external calcium concentration (1.8 mM), so that the first PSC was increased in amplitude in both genotypes (1.8 mM external calcium triturates the PSC amplitude in delta I-IIB to match that of controls measured in 0.5 mM external calcium, see revised Figs. 6H, L). Neither in control, nor in delta I-IIB did this affect the time course of synaptic depression (see revised Figs. 6I, M). This indicates that at low stimulation frequencies (1 and 10Hz) the time course of depression is not affected by mean quantal content. This is consistent with the paired pulse ratio at 100 ms interpulse interval shown in figures 6A-D. However, for synaptic depression at 1 Hz stimulation the variability of the data is higher for delta I-IIB (independent of external calcium concentration, see rev. Fig. 6I), which might also be due to reduced channel number in this genotype. Taken together, the data are in line with the idea that altered cac channel numbers in active zones are sufficient to explain all effects that we observe upon I-IIB excision on PPRs and synaptic depression at low stimulation frequencies. This is now clarified in the revised text on page 12, lines 3 to 7.

      (12) The GFP-tagged I-IIA and mEOS4b-tagged I-IIB cac puncta shown in Figure 6N appear larger than the Brp puncta. Endogenously tagged cac puncta are typically smaller than Brp puncta (Gratz et al., 2019). Also, the I-IIA and I-IIB fluorescence sometimes appear to be partially non-overlapping. First, I suggest adding panels that show all three channels merged. Second, could they analyze the area and area overlap of I-IIA and I-IIB with regard to each other and to Brp, and compare it to cac-GFP? Any speculation as to how the different tags could affect localization? Finally, I recommend moving the dI-IIA and dI-IIB localization data shown in Figure 6N to an earlier figure (Figure 1 or Figure 3).

      We now show panels with the two I-II cac isoforms merged in the revised figure 7H (previously 6N). We also tested merging all three labels as suggested, but found this not instructive for the reader. We thank the reviewer for pointing out that the Brp puncta appeared smaller than the cac puncta in some panels. We carefully went through the data and found that the Brp puncta are not systematically smaller than the cac puncta. Please note that punctum size can appear quite differently, depending on different staining qualities as well as different laser intensities and different point spread in different imaging channels. The purpose of this figure was not to analyze punctum size and labeling intensity, but instead, to demonstrate that I-IIA and I-IIB are both present in most active zones, but some active zones show only I-IIB labeling, as quantified in figure 7I. We did not follow the suggestion to conduct additional co-localization analyses and compare it with cac-GFP controls, because Pearson co-localization coefficients for cac-GFP and all exon-out variants analyzed, including delta I-IIA and delta I-IIB are presented in the revised figure 4D. Moreover, delta I-IIA and delta I-IIB show similar Manders 1 and 2 co-localization coefficients with Brp (see Figs. 4E, F). We do not want to speculate whether the different tags have any effect on localization precision. Artificial differences in localization precision can also be suggested by different antibodies, but we know from our STED analyses with identical tags and antibodies for all isoforms that I-IIA and I-IIB co-localize identically with Brp (see Figs. 5A-E). Finally, we prefer to not move the figure because we believe it is informative to show our finding that active zones usually contain both splice I-II variants together with the finding that only I-IIB is required for PHP.

      Recommendations for the authors:

      Reviewing Editor Comments:

      We thank you for your submission. All three reviewers urge caution in interpreting the S4 splice variant playing a role specifically in Cac localization, as opposed to just leading to instability and degradation. There are other issues with the electrophysiological experiments, a need for improved imaging and analyses, and some areas of interpretation detailed in the reviews.

      We agree that additional data was required to conclude that IS4 splicing plays a specific role in cac channel localization and is not just leading to channel instability and degradation. As outlined in detail in our response to reviewer 1, comment 1, we conducted several sets of experiments to support our interpretation. First, electrophysiological experiments show that upon removal of IS4B, which eliminates synaptic transmission at the larval NMJ and cac positive label in presynaptic active zones, somatodendritic cac current is reliably recorded (new data in revised figure 3C). This is not in line with a channel instability or degradation effect, but instead with IS4B containing isoforms being required and sufficient for evoked release from NMJ motor terminals, whereas IS4A isoforms are not sufficient for evoked release from axon terminals, but IS4A isoforms alone can mediate a distinct component of somatodendritic calcium current. Second, immunohostochemical analyses reveal that IS4A, which is not present in NMJ presynaptic active zones, is expressed sparsely, but in reproducible patterns in the larval brain lobes and in specific regions of the anterior VNC parts (new supplementary figure 1). Again, the absence of a IS4A-containing cac isoform from presynaptic active zones but their simultaneous presence in other parts of the nervous system is in accord with isoform specific localization, but not with general channel isoform instability. Third, enlargements of NMJ boutons with brp positive presynaptic active zones confirm the absence of IS4A and the presence of IS4B in active zones (these enlargements are now shown in the revised figures 2A-C, 3A, and 4A-C). Fourth, as suggested we have quantified the Pearson co-localization of IS4 isoforms with Brp in presynaptic active zones (revised Fig. 2D). This confirms quantitatively similar co-localization of IS4B and control with Brp, but no co-localization of IS4A with Brp. In fact, the labeling intensity of IS4A in presynaptic active zones is quantitatively not significantly different from background, no IS4A label is detected anywhere in the axon terminals at the NMJ, but we find IS4 label in the CNS. Together, these data strongly support our interpretation that the IS4 splice site plays a distinct role in cac channel localization. Figure legends as well as results and discussion section have been modified accordingly (the respective page and line numbers are listed in our-point-by-point responses).

      In addition, we have carefully addressed all other public comments as well as all other recommendations for authors by providing multiple new data sets, new image analyses, and revising text. Addressing the insightful comments of all three reviewers and the reviewing editor has greatly helped to make the manuscript better.

      Reviewer #1 (Recommendations For The Authors):

      The conclusion that the IS4B exon controls Cac localization to active zones versus simply being required for channel abundance is not well supported. The authors need to either mention both possibilities or provide stronger support for the active zone localization model if they want to emphasize this point.

      We agree and have included several additional data sets as outlined in our response to point 1 of reviewer 1 and to the reviewing editor (see above). These new data strongly support our interpretation that the IS4B exon controls Cac localization to active zones and is not simply required for channel abundance. The additions to the figures and accompanying text (including the respective figure panel, page, and line numbers) are listed in the point-bypoint responses to the reviewers’ public suggestions.

      Figure 2C staining for Cac localization in the delta 4B line is difficult to compare to the others, as the background staining is so high (muscles are green for example). As such, it is hard to determine whether the arrows in C are just background.

      We had over-emphasized the green label to show that there really is no cacophony label in active zones. However, we agree that this hampered image interpretation. Thus, we have adjusted brightness such that it matches the other genotypes (see new figure panel 2C, and figure 3A, bottom). Revising the figure as suggested by the reviewer shows much more clearly that IS4B puncta are detected exclusively in presynaptic active zones, whereas IS4A channels are not detectable in active zones or anywhere else in the axon terminal boutons. Quantification of IS4A label in brp positive active zones confirms that labeling intensity is not significantly above background (page 6, lines 29 to 31 and page 7, lines 19 to 21). Therefore, IS4A is not detectable in active zones at the NMJ.

      It seems more likely that the removal of the 4B exon simply destabilizes the protein and causes it to be degraded (as suggested by the Western), rather than mislocalizing it away from active zones. It's hard to imagine how some residue changes in the S4 voltage sensor would control active zone localization to begin with. The authors should note that the alternative explanation is that the protein is just degraded when the 4B exon is removed.

      Based on additional data and analyses, we disagree with the interpretation that removal of IS4B disrupts protein integrity and present multiple lines of evidence that support sparse expression of IS4A channels (ΔIS4B). As outlined in our response to reviewer 1 and to the reviewing editor, we show (1) in new immunohistochemical stainings (new supplementary figure 1) that upon removal of IS4B, sparse label is detectable in the VNC and the brain lobes (for detail see above). (2) In our new figure 3C, we show cacophony-mediated somatodendritic calcium currents recorded from adult flight motoneurons in a control situation and upon removal of IS4B that leaves only IS4A channels. This clearly demonstrates that IS4A underlies a substantial component of the HVA somatodendritic calcium current, although it is absence from axon terminals. This is in line with isoform specific functions at different locations, but not with IS4A instability/degradation. (3) We do not agree with the reviewer’s interpretation of the Western Blot data in figure 1E (formerly figure 1D). Together with our immunohistochemical data that show sparse cacophony IS4A expression, we think that the faint band upon removal of IS4B in a heterozygous background (that reduces labeled channels even further) reflects the sparseness of IS4A expression. This sparseness is not due to channel instability, but to IS4A functions that are less abundant than the ubiquitously expressed cac<sup>IS4B</sup> channels at presynaptic active zones of fast chemical synapses (see page 15, lines 24 to 29).

      If they really want to claim the 4B exon governs active zone localization, much higher quality imaging is required (with enlarged views of individual boutons and their AZs, rather than the low-quality full NMJ imaging provided). Similarly, higher resolution imaging of Cac localization at Muscle 12 (Figure 2H) boutons would be very useful, as the current images are blurry and hard to interpret. Figure 6N shows beautiful high-resolution Cac and Brp imaging in single boutons for the I-II exon manipulations - the authors should do the same for the 4B line. For all immuno in Figure 2, it is important to quantify Cac intensity as well. There is no quantification provided, just a sample image. The authors should provide quantification as they do for the delta I-II exons in Figure 3.

      We did as suggested and added figure panels to figure 2A-C and to new figures 3A (formerly part of figure 2 and 4A-C (formerly figure 3) showing magnified label at the NMJ AZs to better judge on cacophony expression after exon excision. These data are now referred to in the results section on page 6, lines 22 to 24, page 7, lines 18 to 21 and page 8, lines 17/18.

      As suggested, we now also provide quantification of co-localization with brp puncta as Pearson’s correlation coefficient for control, IS4B, and IS4A in the new figure panel 2D (text on page 6, lines 34 to 38). This further underscores control-like active zone localization of IS4B but no significant active zone localization of IS4A. As suggested, we quantified now also the intensity of IS4B label in active zones, and it was not different from control (see revised figure 4H and text on page 8, lines 38/39). We did not quantify the intensity of IS4A label, because it was not over background (text, page 6, lines 30/31).

      Reviewer #2 (Recommendations For The Authors):

      (1a) Questions about the engineered Cac splice isoform alleles:

      The authors using CRISPR gene editing to selectively remove the entire alternatively spliced exons of interest. Do the authors know what happens to the cac transcript with the deleted exon? Is the deleted exon just skipped and spliced to the next exon? Or does the transcript instead undergo nonsense-mediated decay?

      We do not believe that there is nonsense mediated mRNA decay, because for all exon excisions the respective mRNA and protein are made. Protein has been detected on the level of Western blotting and immunocytochemistry. Therefore, we are certain that the mRNA is viable for each exon excision (and we have confirmed this for low abundance cac protein isoforms by rt-PCR), but only subsets of cac isoforms can be made from mRNAs that are lacking specific exons. However, we can not make any statements as to whether the lack of specific protein isoforms exerts feedback on mRNA stability, the rate of transcription and translation, or other unknown effects.

      (1b) While it is clear that the IS4 exons encode part of the voltage sensor in the first repeat, are there studies in Drosophila to support the putative Ca-beta and G-protein beta-gamma binding sites in the I-II loop? Or are these inferred from Mammalian studies?

      To the best of our knowledge, there are no studies in Drosophila that unambiguously show Caβ and Gβγ binding sites in the I-II loop of cacophony. However, sequence analysis strongly suggests that I-IIB contains both, a Caβ as well as a Gβγ binding site (AID: α-interacting domain) because the binding motif QXXER is present. In mouse Cav2.1 and Ca<sub>v</sub>2.2 channels the sequence is QQIER, while in Drosophila cacophony I-IIB it is QQLER. In the alternative IIIA, this motif is not present, strongly suggesting that G<sub>βγ</sub> subunits cannot interact at the AID. However, as already suggested by Smith et al. (1998), based on sequence analysis, Ca<sub>β</sub> should still be able to bind, although possibly with a lower affinity. We agree that this information should be given to the reader and have revised the text accordingly on page 5, lines 9 to 17.

      (1c) The authors assert that splicing of Cav2/cac in flies is a means to encode diversity, as mammals obviously have 4 Cav2 genes vs 1 in flies. However, as the authors likely know, mammalian Cav2 channels also have various splice isoforms encoded in each of the 4 Cav2 genes. The authors should discuss in more detail what is known about the splicing of individual mammalian Cav2 channels and whether there are any homologous properties in mammalian channels controlled by alternative splicing.

      We agree and now provide a more comprehensive discussion of vertebrate Ca<sub>v</sub>2 splicing and its impact on channel function. In line to what we report in Drosophila, properties like G<sub>βγ</sub> binding and activation voltage can also be affected by alternative splicing in vertebrate Ca<sub>v</sub>2 channel, through the exon patterns are quite different from Drosophila. We integrated this part on page 14, first paragraph) in the revised discussion. The respective text is below for the reviewer’s convenience:

      “However, alternative splicing increases functional diversity also in mammalian Ca<sub>v</sub>2 channels. Although the mutually exclusive splice site in the S4 segment of the first homologous repeat (IS4) is not present in vertebrate Cav channels, alternative splicing in the extracellular linker region between S3 and S4 is at a position to potentially change voltage sensor properties (Bezanilla 2002). Alternative splice sites in rat Ca<sub>v</sub>2.1 exon 24 (homologous repeat III) and in exon 31 (homologous repeat IV) within the S3-S4 loop modulate channel pharmacology, such as differences in the sensitivity of Ca<sub>v</sub>2.1 to Agatoxin. Alternative splicing is thus a potential cause for the different pharmacological profiles of P- and Q-channels (both Ca<sub>v</sub>2.1; Bourinet et al. 1999). Moreover, the intracellular loop connecting homologous repeats I and II is encoded by 3-5 exons and provides strong interaction with G<sub>βγ</sub>-subunits (Herlitze et al. 1996). In Ca<sub>v</sub>2.1 channels, binding to G<sub>βγ</sub> subunits is potentially modulated by alternative splicing of exon 10 (Bourinet et al. 1999). Moreover, whole cell currents of splice forms α1A-a (no Valine at position 421) and α1A-b (with Valine) represent alternative variants for the I-II intracellular loop in rat Ca<sub>v</sub>2.1 and Ca<sub>v</sub>2.2 channels. While α1A-a exhibits fast inactivation and more negative activation, α1A-b has delayed inactivation and a positive shift in the IV-curve (Bourinet et al. 1999). This is phenotypically similar to what we find for the mutually exclusive exons at the IS4 site, in which IS4B mediates high voltage activated cacophony currents while IS4A channels activate at more negative potentials and show transient current (Fig. 3; see also Ryglewski et al. 2012). Furthermore, altered Ca<sub>β</sub> interaction have been shown for splice isoforms in loop III (Bourinet et al. 1999), similar to what we suspect for the I-II site in cacophony. Finally, in mammalian VGCCs, the C-terminus presents a large splicing hub affecting channel function as well as coupling distance to other proteins. Taken together, Ca<sub>v</sub>2  channel diversity is greatly enhanced by alternative splicing also in vertebrates, but the specific two mutually exclusive exon pairs investigated here are not present in vertebrate Ca<sub>v</sub>2 genes.”

      (1d) In Figure 1, it would be helpful to see the entire cac genomic locus with all introns/exons and the 4 specific exons targeted for deletion.

      We agree and have changed figure 1 accordingly.

      (2a) Cav2.IS4B deletion alleles:

      More work is necessary to explain the localization of Cac controlled by the IS4B exon. First, can the authors determine whether actual Cac channels are present at NMJ boutons? The authors seem to indicate that in the IS4B deletion mutants, some Cac (GFP) signal remains in a diffuse pattern across NMJ boutons. However, from the imaging of wild-type Cac-GFP (and previous studies), there is no Cac signal outside of active zones defined by the BRP signal. It would benefit the study to a) take additional, higher resolution images of the remaining Cac signal at NMJs in IS4B deletion mutants, and b) comment on whether the apparent remaining signal in these mutants is only observed in the absence of IS4Bcontaining Cac channels, or if the IS4A-positive channels are normally observed (but perhaps mis-localized?).

      We have conducted additional analyses to show convincingly that IS4A channels (that remain upon IS4B deletion) are absent from presynaptic active zone. Please see also responses to reviewers 1 and 3. By adjusting the background values in of CLSM images to identical values in control, delta IS4A, and delta IS4B, as well as by providing selective enlargements as suggested, the figure panels 2C, Ci and 3A now show much clearer, that upon deletion of IS4B no cac label remains in active zones or anywhere else in the axon terminal boutons (see text on page 6, lines 22 to 24). This is further confirmed by quantification showing the in IS4B mutants cac labeling intensity in active zones is not above background (see text on page 6, lines 27 to 31). We never intended to indicate that there was cac signal outside of active zones defined by the brp signal, and we now carefully went through the text to not indicate this possibility unintentionally anywhere in the manuscript.

      (2b) Do the authors know whether any presynaptic Ca2+ influx is contributed by IS4Apositive Cac channels at boutons, given the potential diffuse localization? There are various approaches for doing presynaptic Ca2+ imaging that could provide insight into this question.

      We agree that this is an interesting question. However, based on the revisions made, we now show with more clarity that IS4A channels are absent from the presynaptic terminal at the NMJ. IS4A labeling intensities within active zones and anywhere else in the axon terminals are not different from background (see text on page 6, lines 27 to 31 and revised Figs. 2C, Ci, and 3A with new selective enlargements in response to comments of both other reviewers). This is in line with our finding that evoked synaptic transmission from NMJ axon terminals to muscle cells is mostly absent upon excision of IS4B (see Fig. 3B). The very small amplitude EPSC (below 5 % of the normal amplitude of evoked EPSCs) that can still be recorded in the absence of IS4B is similar to what is observed in cac null mutant junctions and is mediated by calcium influx through another voltage gated calcium channels, a Ca<sub>v</sub>1 homolog named Dmca1D, as we have previously published (Krick et al., 2021, PNAS 118(28):e2106621118. Gathering additional support for the absence of IS4A from presynaptic terminals by calcium imaging experiments would suffer significantly from the presence of additional types of VGCCs in presynaptic terminals (for sure Dmca1D (Krick et al., 2021) and potentially also the Ca<sub>v</sub>3 homolog DmαG or Dm-α1T). Such experiments would require mosaic null mutants for cac and DmαG channels in a mosaic IS4B excision mutant, which, if feasible at all, would be very hard and time consuming to generate. In the light of the additional clarification that IS4A is not located in NMJ axon terminal boutons, as shown by additional labeling intensity analysis, revised figures with selective enlargement, and revised text, we feel confident to state that IS4A is not sufficient for evoked SV release.

      (2c) Mechanistically, how are amino acid changes in one of the voltage sensing domains in Cac related to trafficking/stabilization/localization of Cac to AZs?

      This is an exciting question that has occupied our discussions a lot. Some sorting mechanism must exist that recognizes the correct protein isoforms, just as sorting and transport mechanisms exist that transport other synaptic proteins to the synapse. We do not think that the few amino acid changes in the voltage sensor are directly involved in protein targeting. We rather believe that the cacophony variants that happen to contain this specific voltage sensor are selected for transport out to the synapse. There are possibilities to achieve this cell biological, but we have not further addressed potential mechanisms because we do not want enter the realms of speculation.

      (3) How are auxiliary subunits impacted in the Cac isoform mutants?

      Recent work by Kate O'Connor-Giles has shown that both Stj and Ca-Beta subunits localize to active zones along with Cac at the Drosophila NMJ. Endogenously tagged Stj and CaBeta alleles are now available, so it would be of interest to determine if Stj and particular Cabeta levels or localization change in the various Cac isoform alleles. This would be particularly interesting given the putative binding site for Ca-beta encoded in the I-II linker.

      We agree that the synthesis of the work of Kate O'Connor-Giles group and our study open up new avenues to explore exciting hypotheses about differential coupling of specific cacophony splice isoforms with distinct accessory proteins such as Caβ and α<sub>2</sub>δ subunits. However, this requires numerous full sets of additional experiments and is beyond the scope of this study.

      (4a) Interpretation of short-term plasticity in the I-IIB exon deletion:

      The changes in short-term plasticity presented in Figure 5 are interpreted as an additional phenotype due to the loss of the I-IIB exon, but it seems this might be entirely explained simply due to the reduced Cac levels. Reduced Cac levels at active zones will obviously reduce Ca2+ influx and neurotransmitter release. This may be really the only phenotype/function of the I-IIB exon. Hence, to determine whether loss of the I-IIB exon encodes any functions in short-term plasticity, separate from reduced Cac levels, the authors should compare short-term plasticity in I-IIB loss alleles compared to wild type with starting EPSC amplitudes are equal (for example by reducing extracellular Ca2+ levels in wild type to achieve the same levels at in Cac I-IIB exon deleted alleles). Reduced release probability, simply by reduced Ca2+ influx (either by reduced Cac abundance or extracellular Ca2+) should result in more variability in transmission, so I am not sure there is any particular function of the I-IIB exon in maintaining transmission variability beyond controlling Cac abundance at active zones.

      For two reasons we are particularly grateful for this comment. First, it shows us that we needed to explain much clearer that our interpretation is that changes in paired pulse ratios (PPRs) and in depression at low stimulation frequencies are a causal consequence of lower channel numbers upon I-IIB exon deletion, precisely as pointed out by the reviewer. We have carefully revised the text accordingly on page 10, lines 14-25, page 11, lines 3-7 and 22-28; page 16, lines 36-38. Second, the experiment suggested by the reviewer is superb to provide additional evidence that the cause of altered PPRs is in fact reduced channel number, but not altered channel properties. Accordingly, we have conducted additional TEVC recordings in elevated external calcium (1.8 mM) so that the single PSC amplitudes in I-IIB excision animals match those of controls in 0.5 mM extracellular calcium. This makes the amplitudes and the variance of PPR for all interpulse intervals tested control-like (see revised Figs. 6D, E). This strongly indicates that differences observed in PPRs as well as the variance thereof were caused by the amount of calcium influx during the first EPSC, and thus by different channel numbers in active zones.

      (4b) Another point about the data in Figure 5: If "behaviorally relevant" motor neuron stimulation and recordings are the goal, the authors should also record under physiological Ca2+ conditions (1.8 mM), rather than the highly reduced Ca2+ levels (0.5 mM) they are using in their protocols.

      Although we doubt that the effective extracellular calcium concentration that determines the electromotoric force for calcium to enter the ensheathed motoneuron terminals in vivo during crawling is known, we followed the reviewer’s suggestion partly and have repeated the high frequency stimulation trains for ΔI-IIB in 1.8 mM calcium. As for short-term plasticity this brings the charge conducted to values as observed in control and in ΔI-IIA in 0.5 mM calcium. Therefore, all difference observed in previous figure 5 (now revised figure 6) can be accounted to different channel numbers in presynaptic active zones. This is now explained on page 11, lines 19-28. For controls recordings at high frequency stimulation in higher external calcium (e.g. 2 mM) have previously been published and show significant synaptic depression (e.g. Krick et al., 2021, PNAS). Given that in the exon out variants we do not expect any differences except from those caused by different channel numbers, we did not repeat these experiments for control and ΔI-IIA.

      (5a) Mechanism of Cac's role in PHP :

      As the authors likely know, mutations in Cac were previously reported to disrupt PHP expression (see Frank et al., 2006 Neuron). Inexplicably, this finding and publication were not cited anywhere in this manuscript (this paper should also be cited when introducing PhTx, as it was the first to characterize PhTx as a means of acutely inducing PHP). In the Frank et al. paper (and in several subsequent studies), PHP was shown to be blocked in mutations in Cac, namely the CacS allele. This allele, like the I-IIB excision allele, reduces baseline transmission presumably due to reduced Ca2+ influx through Cac. The authors should at a minimum discuss these previous findings and how they relate to what they find in Figure 6 regarding the block in PHP in the Cac I-IIB excision allele.

      We thank the reviewer for pointing this out and apologize for this oversight. We agree that it is imperative to cite the 2006 paper by Frank et al. when introducing PhTx mediated PHP as well as when discussing cac the effects of cac mutants on PHP together with other published work. We have revised the text accordingly on page 12, lines 9-11 and 21-23 and on page 17, lines 29-33.

      In terms of data presentation in Fig. 6, as is typical in the field, the authors should normalize their mEPSC/QC data as a percentage of baseline (+PhTx/-PhTx). This makes it easier to see the reduction in mEPSC values (the "homeostatic pressure" on the system) and then the homeostatic enhancement in QC. Similarly, in Fig. 6M, the authors should show both mEPSC and QC as a percentage of baseline (wild type or non-GluRIIA mutant background).

      We agree and have changed figure presentation accordingly. Figure 7 (formerly figure 6) was updated as was the accompanying results text on page 12, lines 23-40.

      (6) Cac I-IIA and I-IIB excision allele colocalization at AZs:

      These are very nice and important experiments shown in Figures 6N and O, which I suggest the authors consider analyzing in further detail. Most significantly:

      (6i) The authors nicely show that most AZs have a mix of both Cac IIA and IIB isoforms. Using simple intensity analysis, can the authors say anything about whether there is a consistent stoichiometric ratio of IIA vs IIB at single AZs? It is difficult to extract actual numbers of IIA vs IIB at individual AZs without having both isoforms labeled mEOS4b, but as a rough estimate can the authors say whether the immunofluorescence intensity of IIA:IIB is similar across each AZ? Or is there broad heterogeneity, with some AZs having low vs high ratios of each isoform (as the authors suggest across proximal to distal NMJ AZs)?

      We agree and have conducted experiments and analyses to provide these data. We measured the cac puncta fluorescence intensities for heterozygous cac<sup>sfGFP</sup>/cac, cacIIIA<sup>sfGFP</sup>/cacI-IIB, and cacI-IIB<sup>sfGFP</sup>/cacI-IIA animals. We preferred this strategy, because intensity was always measured from cac puncta with the same GFP tag. Next, we normalized all values to the intensities obtained in active zones from heterozygous cac<sup>sfGFP</sup>/cac controls and then plotted the intensities of I-IIA versus I-IIB containing active zones side by side. Across junctions and animals, we find a consistent ratio 2:1 in the relative intensities of I-IIB and I-IIA, thus indicating on average roughly twice as many I-IIB as compared to I-IIA channels across active zones. This is consistent with the counts in our STED analysis (see Fig. 5F). These new data are shown in the new figure panel 7J and referred to on page 13, lines 10-16 in the revised text.

      (6ii) Intensity analysis of Cac IIA vs IIB after PHP: Previous studies have shown Cac abundance increases at NMJ AZs after PHP. Can the authors determine whether both Cac IIA vs IIB isoforms increase after PHP or whether just one isoform is targeted for this enhancement?

      We already show that PHP is not possible in the absence of I-IIB channels (see figure 7). However, we agree that it is an interesting question to test whether I-IIA channel are added in the presence of I-IIB channels during PHP, but we consider this a detail beyond the scope of this study.

      Minor points:

      (1) Including line numbers in the manuscript would help to make reviewing easier.

      We agree and now provide line numbers.

      (2) Several typos (abstract "The By contrast", etc).

      We carefully double checked for typos.

      (3) Throughout the manuscript, the authors refer to Cac alleles and channels as "Cav2", which is unconventional in the field. Unless there is a compelling reason to deviate, I suggest the authors stick to referring to "Cac" (i.e. cacdIS4B, etc) rather than Cav2. The authors make clear in the introduction that Cac is the sole fly Cav2 channel, so there shouldn't be a need to constantly reinforce that cac=Cav2.

      We agree and have changed all fly Ca<sub>v</sub>2 reference to cac.

      (4) In some figures/text the authors use "PSC" to refer to "postsynaptic current", while in others (i.e. Figure 6) they switch to the more conventional terms of mEPSC or EPSC. I suggest the authors stick to a common convention (mEPSC and EPSC).

      We have changed PSC to EPSC throughout.

      Reviewer #3 (Recommendations For The Authors):

      (1) The abstract could focus more on the results at the expense of the background.

      We agree and have deleted the second introductory background sentence and added information on PPRs and depression during low frequency stimulation.

      (2) What does "strict" active zone localization refer to? Could they please define the term strict?

      Strict active zone localization means that cac puncta are detected in active zones but no cac label above background is found anywhere else throughout the presynaptic terminal, now defined on page 6, lines 27-29.

      (3) Single boutons/zoomed versions of the confocal images shown in Figures 2A-C, 2H, and 3A-C would be very helpful.

      We have provided these panels as suggested (see above and revised figures 2-4). Figure 3 is now figure 4.

      (4) The authors cite Ghelani et al. (2023) for increased cac levels during homeostatic plasticity. I recommend citing earlier work making similar observations (Gratz et al., 2019; DOI: 10.1523/JNEUROSCI.3068-18.2019), and linking them to increased presynaptic calcium influx (Müller & Davis, 2012; DOI: 10.1016/j.cub.2012.04.018).

      We agree and have added Gratz et al. 2019 and Davis and Müller 2012 to the results section on page 12, lines 17/18 and lines 21-23, in the discussion on page 17, lines 29-33.

      (5) The data shown in Figure 3 does not directly support the conclusion of altered release probability in dI-IIB. I therefore suggest changing the legend's title.

      We have reworded to “Excisions at the I-II exon do not affect active zone cacophony localization but can alter cacsfGFP label intensity in active zones and PSC amplitude” as this is reflecting the data shown in the figure panels more directly.

      (6) It would be helpful to specify "adult flight muscle" in Figure 2J.

      We agree that it is helpful to specify in the figure (now revised figure 3C) that the voltage clamp recordings of somatodendritic calcium current were conducted in adult flight motoneurons and have revised the headline of figure panel 3C and the legend accordingly. Please note, these are not muscle cells but central neurons.

      (7) Do dIS4B/Cav2null MNs indeed show an inward or outward current at -90 to -70 mV/-40 and -50 mV, or is this an analysis artifact?

      No, this is due to baseline fluctuations as typical for voltage clamp in central neurons with more than 6000 µm dendritic length and more than 4000 dendritic branches.

      (8) Loss of several presynaptic proteins, including Brp (Kittel et al., 2006), and RBP (Liu et al., 2011), induce changes in GluR field size (without apparent changes in miniature amplitude). The statement regarding the Cav2 isoform and possible effects on GluR number (p. 8) should be revised accordingly.

      We understand and have done two things. First, we measured the intensity of GluRIIA immunolabel in ΔI-IIA, ΔI-IIB, and controls and found no differences. Second, we reworded the statement. It now reads on page 9, lines 1-6: “It seems unlikely that presynaptic cac channel isoform type affects glutamate receptor types or numbers, because the amplitude of spontaneous miniature postsynaptic currents (mEPSCs, Fig. 4K) and the labeling intensity of postsynaptic GluRIIA receptors are not significantly different between controls, I-IIA, and I-IIB junctions (see suppl. Fig. 2, p = 0.48, ordinary one-way ANOVA, mean and SD intensity values are 61.0 ± 6.9 (control), 55.8 ± 8.5 (∆I-IIA), 61.1 ± 17.3 (∆I-IIB)). However, we cannot exclude altered GluRIIB numbers and have not quantified GluR receptor field sizes.”

      (9) The statement relating miniature frequency to RRP size is unclear (p. 8). Is there any evidence for a correlation between miniature frequency to RRP size? Could the authors please clarify?

      We agree that this statement requires caution. Although there is some published evidence for a correlation of RRP size and mini frequency (Neuron, 2009 61(3):412-24. doi: 10.1016/j.neuron.2008.12.029 and Journal of Neuroscience 44 (18) e1253232024; doi: 10.1523/JNEUROSCI.1253-23.2024), which we now refer to on page 9, it is not clear whether this is true for all synapses and how linear such a relationship may be. Therefore, we have revised the text on page 9, lines 6-9. It now reads: “Similarly, the frequency of miniature postsynaptic currents (mEPSCs) remains unaltered. Since mEPSCs frequency has been related to RRP size at some synapses (Pan et al., 2009; Ralowicz et al., 2024) this indicates unaltered RRP size upon I-IIB excision, but we have not directly measured RRP size.”

      (10) Please define the "strict top view" of synapses (p. 8).

      Top view is what this reviewer referred to as “planar view” in the public review points 6 and 7. In our responses to these public review points we now also define “strict top view”, see page 9, lines 17-19.

      (11) Two papers are cited regarding a linear relationship between calcium channel number and release probability (p. 15). Many more papers could be cited to demonstrate a supralinear relationship (e.g., Dodge & Rahaminoff, 1967; Weyhersmüller et al., 2011 doi: 10.1523/JNEUROSCI.6698-10.2011). The data of the present study were collected at an extracellular calcium concentration of 0.5 mM, whereas Meideiros et al. (2023) used 1.5 mM. The relationship between calcium and release is supra-linear around 0.5 mM extracellular calcium (Weyhersmüller et al. 2011). This should be discussed/the statements be revised. Also, the reference to Meideiros et al. (2023) should be included in the reference list.

      We have now updated the Medeiros reference (updated version of that paper appeared in eLife in 2024) in the text and reference list. We agree that the relationship of the calcium concentration and P<sub>r</sub> can also be non-linear and refer to this on page 16, lines 26-32, but the point we want to make is to relate defined changes in calcium channel number (not calcium influx) as assessed by multiple methods (CLSM intensity measures and sptPALM channel counting) to release probability. We now also clearly state that we measured at 0.5 mM external calcium (page 16, lines 27/28) whereas Medeiros et al. 2024 measured at 1.5 mM calcium (page 16, lines 31/32).

      (12) Figure 6: Quantal content does not have any units - please remove "n vesicles".

      We have revised this figure in response to reviewer 2 (comment 5) and quantal content is now expressed as percent baseline, thus without units (see revised figure 7).

      (13) Figure 6C should be auto-scaled from zero.

      This has been fixed by revising that figure in response to reviewer 2 (comment 5)

      (14) The data supporting the statement on impaired motor behavior and reduced vitality of adult IS4A should be either shown, or the statement should be removed (p. 13). Any hypotheses as to why IS4A is important for behavior and or viability?

      As suggested, we have removed that statement.

      (15) They do not provide any data supporting the statement that changes in PSC decay kinetics "counteract" the increase in PSC amplitude (p. 14). The sentence should be changed accordingly.

      We agree and have down toned. It now reads on page 16, lines 7-9: “During repetitive firing, the median increase of PSC amplitude by ~10 % is potentially counteracted by the significant decrease in PSC half amplitude width by ~25 %...”.

      (16) How do they explain the net locomotion speed increase in dI    -IIA larvae? Although the overall charge transfer is not affected during the stimulus protocols used, could the accelerated PSC decay affect PSP summation (I would actually expect a decrease in summation/slower speed)? Independent of the voltage-clamp data, is muscle input resistance changed in dI-IIA mutants?

      Muscle input resistance is not altered in I-II mutants. We refer to potential causes of the locomotion effects of I-IIA excision in the discussion. On page 16, lines 12 to 21 it reads: “there is no difference in charge transfer from the motoneuron axon terminal to the postsynaptic muscle cell between ∆I-IIA and control. Surprisingly, crawling is significantly affected by the removal of I-IIA, in that the animals show a significantly increased mean crawling speed but no significant change in the number of stops. Given that the presynaptic function at the NMJ is not strongly altered upon I-IIA excision, and that I-IIA likely mediates also Ca<sub>v</sub>2 functions outside presynaptic AZs (see above) and in other neuron types than motoneurons, and that the muscle calcium current is mediated by Ca<sub>v</sub>1>/i> and Ca<sub>v</sub>3, the effects of I-IIA excision of increasing crawling speed is unlikely caused by altered pre- or postsynaptic function at the NMJ. We judge it more likely that excision of I-IIA has multiple effects on sensory and pre-motor processing, but identification of these functions is beyond the scope of this study.”